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1.0  INTRODUCTION 


This  report  describes  the  results  of  the  first  year  of  an  effort  to  develop  a 
model  for  the  process  of  hypothesis  generation.  Our  goal  i3  to  study  and 
model  the  process  of  hypothesis  generation  to  provide  information  pertinent 
to  the  general  process  of  decision-problem  structuring,  in  which  hypothesis 
generation  plays  a vital  role.  Hopefully,  an  understanding  of  the  hypothesis 
generation  process  of  the  decision  maker  will  be  useful  in  two  general  areas. 
First,  decision  analyists  can  profit  from  an  understanding  of  the  heuristic 
rules  that  their  clients  use  to  generate  hypotheses.  Second,  if  human  hypo- 
thesis generation  is  found  to  be  deficient,  then  knowledge  of  the  cause  of 

these  deficiencies  will  be  useful  if  hypothesis  generation  aiding  is  to  be 

provided. 

This  report  first  describes  a tentative  model  for  the  hypothesis  generation 
process  which  we  are  evolving.  This  model  separates  the  hypothesis  generation 
process  into  two  sub-processes;  one  sub-process  which  describes  the  retrieval 
of  hypotheses  from  memory,  and  a second  sub-process  which  describes  how  those 
hypotheses  that  are  retrieved  from  memory  are  evaluated.  Two  experiments  are 
described  each  of  which  is  devoted  one  of  these  two  sub-processes. 

It  should  be  emphasized  at  the  onset  that  because  this  hypothesis  generation 

model  is  ambitious  in  scope  that  many  of  its  assumptions  are  as  yet  untested. 
Our  basic  experimental  strategy  has  been  to  attempt  to  identify  those 
assumptions  and  questions  that  seem  most  critical  to  the  model  and  to  address 
these  questions  first.  For  this  reason,  this  version  of  our  model  should  not 
be  considered  to  be  our  definitive  effort.  We  have,  however,  found  it  to  be  a 
useful  guide  to  our  thinking  and  research,  and  we  hope  that  others  will  find 
it  so. 
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2.0  A PROCESS  MODEL  FOR  HYPOTHESIS  GENERATION 

Hypothesis  generation  is  a vital  precursor  to  the  decision  process.  If  the 
Decision  Maker  fails  to  generate  all  of  the  relevant  hypotheses,  any  sub- 
sequent decisions  made  with  an  incomplete  hypothesis  set  may  be  inappro- 
priate. For  example,  if  a physician  diagnoses  a patient  as  having  one  of  four 
diseases  when,  in  fact,  the  patient  suffers  from  a fifth  disease,  then  the 
coherency  of  the  medical  diagnosis  process  has  broken  down,  and  subsequent 
treatment  may  be  ineffectual. 

The  process  of  hypothesis  generation  is  poorly  understood  because  of  the 
paucity  of  research  on  this  topic.  Much  of  the  psychological  research  invest- 
igating human  decision  making  parallels  the  development  of  normative  decision 
models.  As  these  models  are  basically  algorithms  which  operate  on  the  struc- 
ture of  a decision  model,  it  is  not  surprising  that  questions  having  to  do 
with  how  the  Decision  Maker  generates  these  structures  have  been  postponed. 
Furthermore,  the  problem  of  developing  normative  models  for  hypothesis  gen- 
eration has  apparently  been  intractable  and  probably  will  remain  so  for  the 
foreseeable  future. 

Recent  work  in  cognitive  psychology,  however,  suggests  that  the  hypothesis 
generation  process  can  be  profitably  modeled  by  a combination  of  decision- 
theoretic  concepts  and  descriptive  theory.  The  hypothesis  generation  model 
discussed  here  employs  this  approach. 

This  hypothesis  generation  model  has  been  evolving  over  the  last  several 
years,  and  is  tenative  in  nature.  Experimental  results  are  incorporated  into 
the  model  as  soon  as  they  become  available.  Hence,  the  model  is  constantly 
being  elaborated  and  modified  to  account  for  these  new  results. 

2.1  An  Overview  of  the  Hypothesis  Generation  Model 

This  model  is  to  be  applied  in  those  situations  in  which  the  decision  maker  is 
attempting  to  generate  hypotheses  that  will  account  for  the  available  data. 
For  example,  a physician  has  data  from  various  diagnostic  tests.  When  the 
physician  inspects  the  data,  various  diseases  (hypotheses)  may  come  to  mind. 
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The  process  of  generating  new  hypotheses  can  be  modeled  in  the  following  way. 

Assume  that  hypotheses  are  generated  by  a highly  specific  recursive  memory 
search  (Shriffrin,  1970),  which  is  controlled  and  guided  by  an  executive  < 

process  (Newell  and  Simon,  1972).  This  executive  process  inititates,  di- 
rects, and  terminates  memory  searches.  It  is  further  assumed  that  one  im- 
portant input  to  the  executive  process  is  the  plausibility  of  any  hypothesis 
currently  held  by  the  Decision  Maker.  Memory  searches  are  assumed  to  be 
initiated  if  no  hypotheses  currently  exist,  or  if  the  plausibility  of  hypothe- 
ses already  retrieved  from  memory  is  so  low  as  to  require  further  search. 

The  hypothesis  generation  process  begins  when  the  executive  becomes  aware  of 
the  need  to  generate  possible  explanations  for  data.  The  executive  directs 
and  controls  the  memory  search  and  plausibility  assessment  processes.  A 
memory  search  for  new  hypotheses  is  initiated  by  the  executive  based  on  the 
data  that  are  currently  available.  The  memory  search  process  may  retrieve  one 
or  more  hypotheses  from  memory.  These  hypotheses  are  returned  to  the  execu- 
tive. Each  hypothesis  is  then  assessed  for  plausibility  in  light  of  the  data 
on  hand.  Finally,  those  hypotheses  that  survive  the  test  of  plausibility  are 
added  by  the  executive  to  the  current  hypothesis  set. 

If  new  data  are  received,  the  executive  reassesses  the  plausibility  of  the 
current  hypothesis  set  taking  the  new  data  into  account.  Hypotheses  may  be 
dropped  from  the  current  hypothesis  set  at  this  time  because  new  data  renders 
them  implausible.  If  the  total  or  global  plausibility  of  the  entire  current 
hypothesis  set  becomes  too  low,  the  executive  will  initiate  a further  search 
of  memory  attempting  to  find  additional  hypotheses  that  are  consistent  with 
both  the  old  and  new  data.  If  these  new  hypotheses  survive  the  plausibility 
test,  they  are  added  to  the  current  hypothesis  set.  Thus,  the  size  of  the 
current  hypothesis  set  increases  or  decreases  as  new  data  are  incorporated  in 
the  process,  and  the  identity  of  hypotheses  in  the  current  hypothesis  set 
changes  as  new  data  becomes  available.  The  process  is  recursive  in  the  sense 
that  it  may  be  repeated  each  time  new  data  become  available. 
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Htaw  Hypotheses  are  Retrieved  from  Memory  - The  Proposed  Hypothesis 
Retrieval  Process 

Generally,  we  consider  the  act  of  retrieving  hypotheses  to  involve  sematic 
memory  (Tulving,  1972)  since  this  process  is  thought  to  involve  the  retrieval 
of  factual  information  from  the  long-term  memory  store.  At  the  present  we 
have  decided  not  to  adhere  to  any  one  specific  semantic  memory  theory.  In- 
stead, concepts  have  been  adopted  from  both  network  models  such  as  Anderson 
and  Bower  (1973)  and  set-theoretic  models  such  as  Smith,  Shoben  and  Rips 
(197*0.  The  reason  behind  such  a decision  is  that  our  primary  interest  does 
not  lie  in  providing  evidence  for  or  against  any  specific  theory  of  memory. 
Rather,  our  goal  is  to  study  the  generation  of  hypotheses  in  a way  which  can 
be  adapted  to  different  models  of  memory. 

We  suppose  that  hypotheses  are  retrieved  from  memory  using  the  relevant  data 
as  retrieval  cues.  Based  on  data,  the  subject  conducts  a highly-specif ic 
memory  search  to  retrieve  hypotheses  which  can  account  for  the  available  data. 
Consequently,  direct  or  indirect  linkages  must  exist  in  memory  between  data 
and  hypotheses.  There  may  be  accessable  information  in  the  linkages  them- 
selves. We  assume  that  some  of  these  linkages  are  associational  in  nature; 
these  linkages  exist  when  the  data  directly  suggest  the  hypothesis  in  associ- 
ative memory.  Other  linkages  are  Indirect,  or  mediated.  In  these  cases  the 
data  are  used  to  retrieve  some  intermediate  event  or  variable  which,  in  turn, 
serves  as  an  implicit  retrieval  cue  for  the  hypotheses. 

We  suppose  that  hypotheses  are  rarely  invented  "de  novo",  but  rather  that  the 
availability  of  a hypothesis  depends  on  its  prior  existence  in  memory  and  upon 
the  content,  organization,  and  structure  of  the  memory  store.  Decision  Makers 
must  be  able  to  exploit  their  factual  knowledge  which  specifies  the  relation- 
ship between  data  and  hypotheses;  usually  data  and  hypotheses  are  directly  or 
indirectly  related  by  facts  retrieved  from  memory.  In  addition  they  should  be 
able  to  engage  in  accurate  inductive  and  deductive  reasoning  as  the  linkages 
between  hypotheses  and  data  often  are  chains  of  indi-sct  reasoning.  Finally, 
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they  must  be  able  to  recognize  the  similarities  and  the  differences  between 
their  present  situation  and  past  situation.,.  We  assume,  therefore,  that  their 
factual  information  store,  the  organization  of  this  store,  their  reasoning 
ability,  and  their  ability  to  generalize  from  the  past  to  the  present  are 
major  determinants  of  their  performance. 

The  actual  hypotheses  retrieval  process  of  Decision  Makers  is  complicated 
when  the  Decision  Maker  possesses  data  occurring  in  a novel  combination. 
Ideally  their  goal  is  to  retrieve  hypotheses  which  are  consistent  with  all  of 
the  known  data;  in  practice  they  may  settle  for  less  than  perfect  consistency. 
First  consider  the  case  where  N data  are  known  to  the  Decision  Maker.  An 
actual  hypothesis  generation  task  employed  in  the  present  research  will  be 
used  as  an  example  to  make  the  discussion  as  concrete  as  possible.  In  this 
task  subjects  were  given  the  notable  products  and  industiic*'  of  one  of  the 
fifty  American  States.  Their  task  was  to  retrieve  from  memory  States  which 
they  believed  might  have  generated  the  products  and  industries  given  as  data. 
They  were  instructed  to  search  their  memory  for  States  which  were  consistent 
with  all  the  data  given  and  to  respond  with  any  State  that  came  to  mind. 

Considering  States  retrieved  from  memory  as  possible  hypotheses,  our  goal  is 
to  develop  a model  for  the  retrieval  of  these  hypotheses  from  the  products  and 
industries  data  provided. 

Suppose  that  the  task  i3  to  generate  hypothesized  States  for  the  following 
three  products  and  industries  1)  Beef,  2)  Fish,  and  3)  Aerospace.  The  concept 
of  each  datum  may  be  represented  in  memory  as  a node  or  point  (Anderson  and 
Bower,  1973).  The  hypotheses  that  are  yet  to  be  retrieved  can  also  be 
similarly  represented  as  points  in  this  memory  space.  If  subjects  had  only 
one  datum  as  a retrieval  cue  then  their  task  would  be  much  easier.  Suppcse 
they  were  given  the  retrieval  cue  of  Beef.  Subjects  who  actually  used  beef  as 
a single-datum  retrieval  cue  gave  a wide  variety  of  States  as  responses  such 
as:  Kansas,  Oklahoma,  Texas,  Colorado,  etc.  Similarly,  using  fish  as  a 
single-datum  retrieval  cue  led  to  many  seaboard,  and  some  inland  States  in- 
cluding: Maine,  Florida,  Texas  and  many  others.  Since  aerospace  industries 
are  located  in  many  states  their  responses  were  also  varied,  but  frequently 
included  the  two  main  NASA  sites,  Texas  and  Florida. 


L 


Figure  1.  A highly  - simplified  representation  of  the  associational 
network  between  hypotheses  and  data.  Both  direct  and  mediated  associations 


are  shown. 
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Multiple-data  retrieval  is  more  difficult  for  the  subject  because  the  desired 
hypotheses  should  be  consistent  with  all  the  data.  The  subjects  may  encounter 
a combination  of  data  that  is  unique  to  them  for  which  they  have  no  habitual 
answers.  How  do  they  generate  hypotheses  that  satisfy  the  restrictions  im- 
posed by  the  data? 

Our  model  assumes  that  the  Decision  Maker  examines  each  datum  in  turn  and 
traces  out  the  network  of  direct  and  indirect  associations  between  that  datum 
and  many  hypotheses.  Some  hypotheses  are  associated  with  several  of  the  data, 
others  are  suggested  only  by  a single  datum.  A highly  simplified  version  of 
thi3  process  is  schematically  shown  in  figure  1. 


INSERT  FIGURE  1 ABOUT  HERE 


We  assume  that  this  process  is  probabilistic;  figure  1 represents  what  might 
happen  on  a particular  trial  when  the  retrieval  cues  are  beef,  fish  and 
aerospace.  Direct,  or  non-mediated  retrievals  are  shown,  such  as  Kansas  or 
Montana.  Aiso  3hown  are  indirect  retrievals  which  involve  more  reasoning. 
For  example,  the  subjects  may  retrieve  "SEACOAST”  from  fish.  They  then  infer 
that  coastal  states  probably  are  known  for  the  production  of  fish,  leading  to 
the  retrieval  of  Texas,  Florida,  and  California  from  the  category  of  coastal 
states. 

Direct  retrievals  are  assumed  to  be  habitual  responses  to  data  which  have 
become  automatic  through  repetition.  For  example,  Texas  might  be  considered  a 
habitual  response  to  beef.  In  terms  of  Schiffrin  and  Schneider  (1977),  such 
i direct  retrievals  would  constitute  an  example  of  "automatic  processing".  In 

such  a case,  the  datum  should  automatically  activate  a hypothesis  node  without 
any  attention  being  allocated  for  such  a retrieval.  Direct  retrievals  should 
thus  occur  as  a function  of  the  amount  of  practice  or  repetition  which  has 
been  devoted  to  the  storage  of  the  direct  association  between  a given  hypothe- 
sis and  any  relevent  data.  In  addition,  direct  retrievals  should  require  a 
minimum  of  conscious  processing  activity  and  attentional  demands.  However, 
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Figure  2.  A model  of  the  hypothesis  retrieval  process. 
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in  situations  where  there  is  no  direct  association  between  a datum  and  hypo- 
theses stored  in  memory,  an  indirect  retrieval  may  also  suggest  a hypothesis. 
What  we  consider  to  be  an  indirect  or  mediated  retrieval  can  be  considered  a 
case  of  what  Schiffrin  and  Schneider  (1977)  called  "controlled  processing". 
In  this  situation,  there  is  no  habitual  response  to  the  data  and  the  subject 
is  assumed  to  allocate  attention  to  the  activation  of  a sequence  of  nodes 
which  will  eventually  lead  to  the  activation  of  an  appropriate  hypothesis 
node.  Such  processing  is  considered  to  be  controlled  since  it  is  actively 
directed  by  the  subject  and  requires  attention.  Mediated  retrieval  should 
also  be  considered  inferential  since  the  relationship  between  the  data  and 
hypotheses  is  often  not  stored  directly  and  the  subject  has  to  infer  this 
relationship  from  concepts  common  to  both  the  data  and  hypotheses. 

The  states  that  are  retrieved  for  each  retrieval  cue  are  tagged  in  memory  as 
having  been  retrieved.  If  subjects  examine  a second  datum,  they  may  retrieve 
a State  that  was  previously  tagged  for  earlier  data.  We  assume  that  if  a 
subject  retrieves  a hypothesis  having  several  tags,  this  hypothesis  then  be- 
comes a candidate  for  further  analysis.  The  number  of  such  tags  of  a parti- 
cular hypothesis  that  are  necessary  probably  depends  on  the  situation.  In 
figure  1 , the  hypothetical  subject  is  shown  as  having  tagged  Texas  for  all 
three  data;  Florida  and  California  for  two  data,  and  all  other  states  were 
tagged  once.  Suppose  the  subject's  criterion  is  two  or  more  tags  before 
processing  a hypothesis  further.  The  subject  would  end  the  hypothesis  re- 
trieval process  with  the  hypotheses  of  Texas,  Florida  and  California. 

If  the  subject  still  has  more  data  to  process  when  a hypothesis  has  the 
criterion  number  of  tags,  the  hypothesis  search  process  may  be  temporarily 
suspended,  and  the  subject  may  begin  a consistency  checking  process.  We  will 
discuss  the  mechanism  of  consistency  checking  In  more  detail  in  Section  2.3 
which  follows.  Consistency  checking  may  occur  when  the  retrieval  process 
locates  a hypothesis  having  the  criterion  number  of  tags.  This  hypothesis  is 
returned  to  the  executive  for  further  processing,  the  Decision  Maker  is  now 
consciously  aware  of  the  candidacy  of  the  hypothesis. 

Consistency  checking  involves  making  a more  limited  memory  search  using  two 
retrieval  cues  - the  datum  and  the  hypothesis.  This  search  is  more  specific 
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in  nature  than  in  hypothesis  retrieval  because  it  consists  of  a memory 
search  for  rel;  -ional  information  which  supports  or  refutes  the  hypothesis, 
is  the  hypothesis  is  now  known  and  is  used  in  conjuction  with  the  datum  as  a 
second  retrieval  cue,  the  search  is  more  limited  and  attempts  to  retrieve 
information  relevant  to  both  cues.  Suppose  that  three  additional  data  were 
added  to  beef,  fish  and  aerospace  which  were  4)  citrus  products,  5)  tourists 
and  6)  cypress  products,  and  that  the  subject  has  Just  retrieved  Texas  from 
the  first  three  data.  He  could  now  search  his  memory  to  determine  if  Texas  is 
noted  for  citrus,  and  would  probably  retrieve  information  consistent  with 
Texas.  Another  consistency  check  would  compare  the  fourth  datum,  tourists, 
with  Texas.  Upon  checking  tourists  an  interesting  complication  arises.  Most 
subjects  do  not  think  of  Texas  as  being  noted  for  tourists  but  Texas  certainly 
must  have  some  tourists.  We  imagine  that  Texas  would  survive  a consistency 
check  if  the  subject  does  not  retrieve  information  that  is  inconsistent  with 
Texas.  The  datum  "cypress  products"  is  an  example  of  a retrieval  cue  where 
many  subjects  did  not  have  information  in  memory  which  linked  cypress  and 
Texas.  It  is  tentatively  assumed  that  if  such  information  is  lacking  that  the 
hypothesis  usually  will  be  retained,  but  that  a datum  which  is  clearly  incon- 
sistent with  a hypothesis  usually  will  cause  the  hypothesis  to  be  discarded. 

Figure  2 summarized  the  major  features  of  the  hypothesis  retrieval  subprocess 
of  the  total  model. 


INSERT  FIGURE  2 ABOUT  HERE 


The  concept  of  memory  tags  is  consistent  with  modern  memory  theory  (Smith, 
Shoben  and  Rips,  197*0  where  it  is  assumed  that  information  is  stored  in 
memory  in  associated  clusters  of  attributes.  For  example,  facts  associated 
with  the  "Bear  D"  aircraft,  such  as  its  flight  characteristics,  range, and 
sensor  systems,  are  stored  in  a "Bear  D"  cluster.  A memory  tag  is  an  addi- 
tional fact  which  is  added  to  this  cluster  and  is  a marker  that  a hypothesis 
node  was  recently  active  in  memory.  As  some  hypothesis  nodes  will  acquire 


-9- 


I 


multiple  tags  because  they  are  retrieved  for  several  data,  those  tags  provide 
a mechanism  for  identifying  hypotheses  which  are  suggested  by  the  collection 
of  data,  an  essential  feature  for  any  such  model. 

This  notion  of  tagging  by  recency  of  activation  is  consistent  with  Schiffrin 
and  Schneider  (1977)  who  consider  short-term  memory  as  being  a series  of 
activated  nodes  in  a network  of  otherwise  inactive  nodes.  If  a concept  node 
which  represents  a hypothesis  is  retrieved  it  is  temporarily  activated.  We 
assume  that  this  activation  can  serve  as  a tag  or  marker  which  indicates  that 
a given  hypothesis  has  been  related  to  a datum.  This  memory  tag  assumption  is 
an  attempt  to  provide  a psychological  process  explanation  for  multiple-data 
hypothesis  retrieval. 

A single  datum  generally  evokes  many  hypotheses;  most  of  which  are  inconsis- 
tent with  the  remainder  of  the  data.  While  we  are  assuming  that  hypotheses 
are  tagged  in  memory,  many  other  similar  mechanisms  could  be  proposed  to 
accomplish  the  goal  of  retrieving  hypotheses  which  are  consistent  with  most  or 
all  of  the  data. 

The  consistency  checking  assumption  is  a new  addition  to  the  model,  and  for 
this  reason  is  not  a topic  in  the  present  research.  However,  we  believe  that 
this  feature  of  the  model  will  be  particularly  useful  for  describing  situa- 
tions where  the  Decision  Maker  has  large  amounts  of  data  to  process.  In  these 
situations  it  seems  logical  that  he  would  retrieve  a hypothesis  from  part  of 
the  data  and  check  its  consistency  with  the  remainder  of  the  data.  Such  a 
heuristic  strategy  would  save  considerable  time  and  thought  as  compared  to 
exhaustive  memory  searches  on  all  data. 
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?,3  How  Hypotheses  are  Evaluated  - A Dual  Plausibility  Assessment  Subprocess 

When  a hypothe.  is  is  retrieved  from  memory  its  plausibility  is  evaluated. 
Some  hypotheses  are  found  to  be  plausible  and  are  added  to  the  set  of  hypothe- 
ses that  the  Decision  Maker  is  currently  entertaining  (the  current  hypothesis 
3et).  Other  hypotheses  are  found  to  be  implausible  and  are  discarded. 

The  plausibility  assessment  process  is  dual.  First,  it  is  used  to  assess  each 
individual  hypothesis  to  decide  if  it  is  sufficiently  plausible  to  use  in  the 
current  hypothesis  set,  as  previously  mentioned.  Second,  the  plausibility  of 
the  individual  hypotheses  are  cumulated  to  yield  a plausibility  for  the  entire 
current  hypothesis  set.  The  plausibility  of  the  current  hypothesis  set  con- 
trols the  memory  search  process.  Hypothesis  retrievals  from  memory  are  as- 
sumed to  cease  when  the  plausibility  of  the  entire  set  of  hypotheses  i3 
sufficiently  high. 

Modeling  the  plausibility  estimation  process,  unlike  hypothesis  retrieval, 
can  profit  from  decision-theoretic  constructs.  Our  goal  is  to  apply  Bayes' 
Theorem  to  a process  that  has  different  characteristics  than  the  typical 
Bayesian  inference  task.  In  a Bayesian  inference  task,  the  hypotheses  are 
usually  known  and  enumerated.  Furthermore,  Decision  Makers  usually  assume 
that  their  hypothesis  set  is  exhaustive;  that  they  have  enumerated  all  hy- 
potheses that  are  possible  in  the  light  of  their  data.  In  practice,  the 
Decision  Makers  may  deliberately  neglect  possible  but  highly  unlikely  hy- 
potheses, or  they  may  introduce  a "catch-all"  hypothesis  (Edwards,  1966) 
which  then  makes  the  hypothesis  set  exhaustive  by  definition. 

On  the  other  hand,  the  Decision  Maker  in  a hypothesis  generation  task  begins 
with  no  specific  hypothesis  in  mind.  The  process  starts  with  a "yet-to-be- 
enumerated"  set  of  all  hypotheses,  H,  which  are  assumed  to  be  possible  for  the 
data,  D,  for  all  Hi  in  H,  P(D|H  )>0. 
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Let: 

H = {finite  set  of  all  hypotheses  for  which  P(D|H.)>0.} 

When  hypotheses  are  generated  by  the  Decision  Maker  the  set  H can  be  parti- 
tioned into  two  subsets: 

C = {subset  of  hypotheses  currently  under  consideration  by  the  Decision 
Maker. } 

C = {subset  of  hypotheses  not  currently  under  consideration  by  the 
Decision  Maker.} 

It  follows  that  C U C = H,  as  any  given  hypothesis  must  be  in  C or  in  5.  C 
will  be  termed  the  "current  hypothesis  set". 

The  posterior  plausibility,  P(FL|D),  is  defined  over  the  entire  set  H,  and  is 
a Bayesian  posterior  probability: 

P(Hi)P(D|H1) 

P(D) 

where  P(D)  is  calculated  for  all  in  H,  and  D stands  for  a datum,  or  a 
collection  of  data. 


1)  PCHjD) 


We  define  the  plausibility  of  C using  Bayes'  theorem  as: 


2)  P ( C | D ) = P(C)p(D)^~)  ‘ 


The  term  P(C|D)  is  the  probability  that  one  of  the  hypotheses  in  the  current 
hypothesis  set  has  generated  the  data,  and  it  can  be  readily  calculated  as: 

3)  P(C|D)  = £p(H  |D) , 

H^C 

where  P(H^|D)  are  the  posterior  plausibilities  of  the  hypotheses  in  C. 

We  deliberately  use  the  term  plausibility  to  remind  the  reader  that  although 
the  concepts  are  Bayesian,  our  application  is  to  the  process  of  hypothesis 
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, generation,  which  requires  the  evaluation  of  quantities  not  usually  evaluated 

n Bayesian  probabilistic  inference.  The  Decision  Maker's  goal  in  hypothesis 

* generation  task  is  to  generate  plausible  hypotheses  for  C.  The  Decision 

Maker's  goal  in  a probabilistic  inference  task  is  to  evaluate  the  posterior 
probabilities  of  hypotheses  which  have  been  previously  generated.  In  hypothesis 
generation  tasks,  P(C|D)  is  initially  zero,  as  the  subset  of  hypotheses  cur- 
rently under  consideration  is  empty  until  the  Decision  Maker  starts  hypoth- 
esis generation.  After  sucessful  completion  of  a hypothesis  generation  task, 
P (C | D)  may  approach  1.0. 

The  Decision  Maker  has  two  alternatives  available  to  deal  with  a probabilistic 
inference  task  for  which  P(C|D)<1.  One  approach  i3  to  assume  r f C | D ) = 1,  thus 
assuming  that  hypotheses  not  in  C may  be  neglected  with  a small  probability  of 
error.  For  example,  in  a coin-flipping  task,  the  Decision  Maker  may  choose  to 
neglect  the  unlikely  alternative  of  a coin  landing  on  edge,  and  may  choose  to 
define  P(Heads)  + P(Tails)  = 1.  Another  approach  is  for  the  Decision  Mak^r  to 
include  a catch-all  hypotheses  in  C,  in  which  case  P(C|D)  = 1 and  P(S|D)  = 0 by 
definition.  The  emphasis  in  probabilistic  inference  is  on  the  relative  size 
of  the  P ( H | D ) probabilities  for  the  various  enumerated  hypotheses.  The  catch- 
all hypothesis  is  usually  of  secondary  interest,  and  is  often  included  solely 
so  that  it  can  be  claimed  that  the  hypothesis  set  is  exhaustive. 

All  hypotheses  can  be  thought  to  originate  from  "catch-all"  hypothesis  in 
hypothesis  generation,  and  the  primary  focus  is  on  the  3ize  of  the  proba- 
bilities of  the  enumerated  hypotheses  in  C in  comparison  to  the  un-enumerated 
hypotheses  in  C which  can  be  conceptualized  as  a giant  catch-all  hypothesis. 
Our  purpose,  therefore,  in  introducing  the  term  plausibility  is  to  emphasize 
the  possible  differences  in  psychological  processes  between  hypothesis  gen- 
eration tasks  and  probabilistic  inference  tasks,  while  simultaneously  noting 
the  utility  of  Bayes'  theorem  in  describing  both  tasks. 

I 

If  it  is  assumed  that  the  cost  of  entertaining  additional  hypotheses  is 
negligible,  then  the  "Ideal  Decision  Maker"  should  entertain  all  hypotheses 
for  which  P ( D | H ) >0 . This  logically  implies  that  P < C | D ) = 1.0  Therefore,  the 
set  of  hypotheses  for  the  "Ideal  Decision  Maker"  may  be  very  large.  Human 
Decision  Makers,  on  the  other  hand,  have  non-negliglble  costs  for  adding  low- 
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probability  hypotheses  to  their  current  hypothesis  set;  costs  of  increasing 
their  information-processing  burden,  and  costs  of  collecting  information  ne- 
cessary to  evaluate  these  additional  hypotheses.  Finally,  and  most  impor- 
tantly, the  human  Decision  Maker  must  retrieve  hypotheses  from  memory  in  order 
to  evaluate  them.  These  considerations  suggest  that  a human  Decision  Maker 
will  generate  a less-than-exhaustive  hypothesis  set.  Next  to  be  discussed  are 
our  previous  research  results  (Gettys  and  Fisher,  in  preparation)  bearing  on 
the  plausibility  assessment  model  which  have  identified  several  heuristic 
decision  rules  used  in  plausibility  assessment. 

2.4  Past  Research  on  the  Plausibity  Model. 

Our  model  of  human  plausibility  assessment  assumes  that  Decision  Makers  base 
their  decisions  on  whether  or  not  to  include  a hypothesis  in  the  current 
hypothesis  set  on  its  subjective  plausibility.  It  further  assumes  that  the 
memory  retrieval  process  is  controlled  by  a second  subjective  plausibility, 
namely  the  plausibility  of  the  entire  current  hypothesis  set.  The  quantity 
P(HJd)  is  the  normative  counterpart  of  the  subjective  plausibility  of  a 
single  hypothesis,  and  P ( C | D ) has  the  same  relationship  to  the  subjective 
plausibility  of  the  entire  current  hypothesis  set.  The  Gettys-Fisher  study 
primarily  attempted  to  discover  how  subjects  utilize  their  subjective  plausi- 
bilities to  1)  decide  if  a hypothesis  should  be  included  in  the  current 
hypothesis  set  and  2)  to  determine  if  the  hypothesis  retrieval  process  should 
be  initiated  or  terminated. 

Subjects  in  the  Gettys-Fisher  experiment  worked  three  hypothesis  generation 
tasks,  one  of  which  has  been  previously  discussed.  Subjects  estimated  pos- 
terior odds  for  all  hypotheses  that  they  had  generated.  The  magnitude  of 
posterior  odds  for  new  hypotheses  on  the  trial  when  they  were  Introduced  was 
found  to  be  related  to  the  odds  of  the  most  probable  of  the  old  hypotheses. 
The  results  suggested  that  subjects  employ  a heuristic  rule  of  adding  a 
hypothesis  to  the  current  hypothesis  set  only  if  its  plausibility  is  high 
enough  to  make  it  a strong  contender.  In  fact,  90J  of  the  hypotheses  intro- 
duced were  at  least  half  as  likely  as  the  most  plausible  hypothesis,  and  the 
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modal  new  hypothesis  was  either  evaluated  as  the  most  plausible  hypothesis,  or 
as  plausi’  le  os  the  mo3t  plausible  hypothesis.  This  heuristic  may  have 
profound  implications  for  hypothesis  generation.  It  suggests  that  the  basic 
strategy  is  to  search  for  hypotheses  that  will  be  "leading  contenders"  <n  the 
current  hypothesis  set.  The  decision  to  include  a new  hypothesis  in  thi3  set 
is  primarily  governed  by  what  is  already  in  the  set.  Presumably,  the  subject 
is  comparing  hi3  candidate  hypothesis  with  tjie  most  plausible  of  his  current 
hypotheses.  The  subject  tends  to  become  increasingly  strict  in  the  criterion 
for  hypothesis  adoption  a3  more  hypotheses  are  generated;  a process  that  works 
against  the  likelihood  of  obtaining  an  exhaustive  3et.  However,  this  strategy 
may  be  rationalized  by  the  subject  as  a search  for  a "better"  hypothesis, 
rather  than  an  exhaustive  set.  Thi3  behavior  can  be  characterized  a3  "solu- 
tion" searching  rather  than  an  attempt  to  generate  an  exhaustive  hypothesis 
3et,  as  instructed. 

The  second  major  result  of  the  Gettys-Fisher  3tudy  relates  to  the  control  of 
the  hypothesis  retrieval  process.  New  hypotheses  are  much  more  likely  to  be 
introduced  when  new  data  reduces  the  plausibility  of  the  current  hypotheses. 
This  result  suggests  that  hypothesis  generation  is  cyclic,  and  usually  occurs 
when  the  Decision  Maker  realizes  that  his  current  hypothesis  set  is  inadequate 
in  the  light  of  the  new  data.  This  process  is  a second  major  heuristic  which 
we  propose  and  is  consistent  with  much  of  the  concept  identification  litera- 
ture which  suggests  that  subjects  engage  in  "win-stay,  lose-shift"  strategies 
(Kintsch,  1970)  where  they  retain  hypotheses  which  are  consistent  with  the 
data  and  test  new  hypotheses  if  the  old  hypotheses  are  inconsistent  with  the 
data. 

Exactly  how  the  plausibility  assessment  process  works  is  yet  to  be  establish- 
ed. At  present  our  thinking  on  thi.3  topic  is  tentative.  We  assume  that  the 
consistency-checking  process  previously  described  is  a preliminary  plausi- 
bility-screening process  rather  than  a full-blown  plausibility  assessment. 
In  many  way3  the  boundaries  between  consistency  checking,  plausibility  as- 
sessment, and  probablistic  inference  are  arbitrary,  and  we  now  believe  that 
many  of  the  3ame  mechanisms  contribute  to  each  of  these  processes.  We  believe 
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that  each  of  these  three  processes  involves  retrieval  of  information  from 
memory  and  inductive  and  deductive  reasoning.  However,  these  processes  dif- 
fer in  both  their  goals  and  in  the  relative  contributions  of  memory  and 
reasoning.  Perhaps  the  following  definitions  will  clarify  these  dis- 
tinctions. 

1)  Consistency  checking  may  be  used  by  the  Decision  Maker  in  his  initial 

screening  of  hypotheses  particularly  if  the  data  are  numerous.  If  a hypoth- 
esis is  retrieved  before  all  data  have  been  processed,  the  consistency  of  that 
hypothesis  will  be  checked  with  the  remaining  data.  The  hypothesis  will  be 
abandoned  if  inconsistent  information  is  retrieved  from  memory.  Thi3  process 
operates  very  rapidly  and  involves  only  superficial  memory  searching  and 
reasoning.  Its  major  purpose  is  to  screen  hypotheses  for  obvious  defects 
before  subjecting  them  to  a more  exhaustive  analysis.  \ 

2)  Plausibility  assessment  involves  more  reflection  and  deeper  analy- 

sis. The  major  goals  are  to  decide  if  the  hypothesis  is  sufficiently  plaus- 
ible to  warrant  including  it  in  the  current  hypothesis  set  and  to  decide  if  1 

the  current  hypothesis  3et  is  sufficiently  exhaustive.  These  goals  are  a- 

chieved  by  reasoning  with  facts  and  information  retrieved  from  memory  using 
the  data  and  the  hypothesis  as  retrieval  cues.  The  concept  of  availability 
proposed  by  Tversky  and  Kahneman  (1973,  197*0  with  its  emphasis  on  frequency 
and  ease  of  memory  retrieval  is  important  here.  It  may  be  that  the  availabil- 
ity of  a hypothesis  determines  its  plausibility  assessment.  Additionally,  the 
plausibility  of  the  current  hypotheses  set  may  be  judged  by  a "metamemory" 
process  (Lindsay  and  Norman,  1977);  metamemory  is  the  information  we  have 
about  the  contents  of  our  memory  store.  For  example,  if  you  are  asked  to  give 
the  data  of  George  Washington's  inauguration,  your  metamemory  gives  you  an 
indication  of  the  likelihood  of  retrieving  this  data.  Plausibility  assess- 
ment of  the  current  hypothesis  set  may  be  based  on  metamemory,  it  may  be  based 
on  the  ease  of  hypothesis  retrieval  or  the  number  of  hypotheses  retrieved,  or 
it  may  be  based  on  all  of  these  processes.  Perhaps  if  your  metamemory 
suggests  that  you  are  knowledgeable  in  the  area,  and  if  your  memory  searches 
no  longer  successfully  retrieve  hypotheses,  then  you  conclude  that  you  have 
retrieved  all  of  the  relevant  hypotheses.  This  conclusion  would  lead  to  a j 

high  plausibility  assessment  for  the  current  hypothesis  set. 


We  include  these  ideas  on  the  mechanism  of  plausibility  assessments,  specula- 
tive as  they  are,  to  document  some  of  our  current  interests.  They  will  not 
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ecome  a formal  part  of  our  model  until  supported  by  more  data.  At  this  time, 
all  we  can  say  is  that  the  goal  of  plausibility  assessment  is  to  determine  if 
hypotheses  or  groups  of  hypotheses  are  plausible  enough  to  process  further. 
We  have  identified  several  heuristic  strategies  which  may  be  employed  by  the 
Decision  Maker  in  plausibility  assessment. 

3)  Probablistlc  inference  concentrates  on  the  assignment  of  subjective 
probability  measures  to  specified,  enumerated  outcomes.  Usually  the  emphasis 
is  on  identifying  the  most  probabable  hypotheses,  or  hypothesis.  It  also 
involves  reasoning  with  information  retrieved  from  memory  but  differs  from 
plausibility  assessment  in  that  the  hypotheses  are  accepted  as  given;  no 
attempt  is  made  to  generate  new  hypotheses. 

Our  current  model  for  the  plausibility  assessment  process  is  presented  in 
figure  3*  The  input  to  this  process  are  hypotheses  which  have  been  retrieved 
from  memory  and  may  have  been  checked  for  consistency.  If  the  hypotheses  have 
aurviv  d a consistency-checking  process  it  can  be  assumed  that  they  are  at 
least  minimally  plausible,  in  the  sense  that  no  data  inconsistent  with  their 
possibility  has  been  found. 


INSERT  FIGURE  3 ABOUT  HERE 
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Figure  3.  A model  for  the  plausibility  assessment  process. 
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3.0  THE  MEMORY  RETRIEVAL  EXPERIMENT 


Most  memory  theorists  have  been  concerned  with  retrieval  from  memory  using 
a single  retrieval  cue.  Hypothesis  retrieval,  however,  typically  involves 
the  retrieval  from  memory  using  many  data  as  retrieval  cues,  and  therefore 
must  have  an  additional  mechanism  for  retrieving  hypotheses  which  are 
consistent  with  most  or  all  of  the  data.  Consequently,  an  assumption  that 
memory  is  searched  using  each  datum  in  turn,  and  that  all  hypotheses 
retrieved  by  any  of  these  single-datum  memory  searches  are  processed  fur- 
ther, does  not  usually  produce  hypotheses  that  are  consistent  with  all  or 
most  of  the  data.  In  fact,  the  number  of  candidate  hypotheses  produced  by 
such  a process  would  be  unmanageable  in  size  and  the  number  of  relevant 
hypo  heses  would  be  only  a small  percentage  of  the  total. 

However,  making  the  opposite  assumption,  that  a hypothesis  must  be  re- 
trieved for  all  data  is  equally  unsatisfactory  because  far  too  few  hypothe- 
ses would  be  produced  by  such  a process.  Successful  retrieval  from  memory 
requires  the  presence  of  the  necessary  information  in  memory,  and  its 
accessibility.  As  the  number  of  data  increase  the  probability  of  re- 
trieving any  particular  hypothesis  for  all  data  becomes  vanishingly  small. 

These  two  assumptions  do  have  the  virtue  that  they  are  limiting-case3  and 
define  the  ends  of  a continuum  along  which  human  performance  must  ne- 
cessarily fall.  The  basic  strategy  employed  in  thi3  experiment  was  to 
develop  a model  which  could  serve  as  a yard-stick  to  measure  where  human 
performance  falls  on  this  continuum.  The  model  then  can  be  used  as  a 
measuring  device  where  its  free  parameter  estimates  this  location.  This 
experimental  strategy  has  been  employed  by  Greeno  (1970)  who  has  applied 
the  idea  of  using  a model  a3  a measurement  tool  to  questions  in  human 
learning. 

3 . 1 Details  of  the  Memory  Tagging  Model 

The  model  assumes  that  hypotheses  may  be  tagged  in  response  to  datum.  In 
the  case  of  a multi-data  hypothesis  search  involving  i_ data , the  number  of  tags, 
X,  is  a random  variable,  0 £ X _<  I.  It  is  also  assumed  that  the  subjects 
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employ  a variable  response  criterion  C,  which  is  adjusted  by  the  subject 
depending  on  the  number  of  data,  the  instructions,  and  the  amounts  of  task- 
relevant information  in  memory.  For  these  reasons  it  seems  appropriate  to 
assume  that  C is  also  a random  variable,  1 £ C < I.  We  assume  that  C i3 
distributed  in  the  interval  1 to  I with  probabilities  obtained  from  the 
binomial  distribution  (N  = I - 1,  P).  This  distribution  wa3  chosen  because 
it  is  discrete,  single-peaked,  and  because  it  approximates  the  normal  as  N 
becomes  large.  Basically,  the  assumption  of  a variable  criterion  which  is 
probablistically  distributed  implies  that  the  average  criterion,  C,  is  the 
mean  of  a random  variable.  If  X > C,  where  C is  the  criterion  number  of 
tag3  on  that  trial  for  that  subject,  then  the  subject  responds  with  that 
hypothesis.  If  X < C then  no  response  is  made. 

The  mean  criterion,  C,  is  the  free  parameter  of  interest.  It  is  bounded  by 
1 and  by  I,  which  correspond  to  the  two  limiting  cases  discussed  earlier. 
In  the  present  experiment  the  basic  strategy  is  to  estimate  C from  the 
single-datum  and  multi-data  retrieval  probabilities  and  the  model.  The 
estimate  of  C locates  human  performance  along  the  continuum  of  interest. 

3.2  Method  and  Procedure 

3.2.1  Design.  Each  subject  performed  three  hypothesis  retrieval  tasks,  a 
task  where  one  datum  was  used  a3  a retrieval  cue  and  two  other  tasks  where 
three  and  six  data  were  used.  The  assignment  of  tasks  to  these  three 
conditions  was  counterbalanced,  so  that  for  each  task  an  equal  number  of 
subjects  were  run  in  the  one,  three,  and  six  data  conditions.  Order  of 
tasks  and  order  of  presentation  were  also  counterbalanced.  Therefore, 
each  subject  performed  once  in  each  of  the  tasks  and  once  in  each  of  the 
three  conditions. 

3.2.2  Hypothesis  Generation  Tasks.  The  three  hypotheses  generation  tasks 
which  were  used  were  those  previously  employed  by  Gettys  and  Fisher  (in 
preparation) . Each  task  involved  the  generation  of  hypotheses.  In  the 
"States"  task  the  possible  hypotheses  were  the  50  U.S.A.  States  and  the 
data  were  notable  products  and  industries  of  one  of  these  States.  In  the 
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"Occupations"  task  the  hypotheses  were  occupations  of  skilled  tradesmen 
and  the  data  were  tools  typically  used  by  a workman  in  one  of  these 
occupations.  In  the  "Majors"  task,  various  majors  at  the  University  of 
Oklahoma  were  thr‘  hypotheses,  and  the  data  were  classes  taken  by  University 
of  Oklahoma  students.  In  the  various  tasks,  subjects  were  given  one, 
three,  or  six  data  and  were  told  to  respond  with  as  many  hypotheses  as 
possible  which  occurred  to  them.  Subjects  were  told  to  respond  with  any 
hypothesis  that  came  to  mind  after  inspecting  the  data  without  being  con- 
cerned with  its  plausibility  or  lack  of  plausibility. 

These  tasks  were  carefully  chosen  with  the  following  criteria  in  mind. 
First,  we  wanted  tasks  which  were  within  the  competence  of  male  College 
Freshmen.  For  this  reason  tasks  involving  special  expertise  and  training 
seemed  inadvisable.  Second,  we  wanted  tasks  where  our  subjects  would  be 
roughly  equivalent  in  terms  of  their  memory  store,  suggesting  that  tasks 
which  were  based  on  common  information  which  all  Freshmen  should  possess 
would  be  preferable.  Finally,  the  tasks  should  have  large  numbers  of 
plausible  hypotheses  for  the  given  data.  The  "States"  task  has  50  po- 
tential hypotheses.  The  "Occupations"  task  has  several  thousand  potential 
hypotheses,  and  there  are  over  200  Majors  at  the  University  of  Oklahoma. 
The  data  used  in  the  three  tasks  is  presented  in  Table  t. 


INSERT  TABLE  1 ABOUT  HERE 

| I 

In  the  six-data  task  all  six  data  were  presented  on  a single  trial.  The 
order  of  presentation  of  the  data  was  randomized  in  the  one-datum  and  six- 
data  conditions  for  every  subject.  The  order  of  data  within  each  three- 
data  cluster  in  the  three-data  condition  was  also  randomized. 

3.2.3  Subjects.  164  male  University  of  Oklahoma  introductory  psychology 
students  served  as  subjects  in  partial  fulfillment  of  course  requirements. 
Subjects  were  assigned  to  conditions  according  to  a predetermined  random 
block  order,  a design  which  called  for  144  subjects.  Data  from  the  addi- 
tional 20  subjects  was  not  used  due  to  experimental  errors. 

I 

\ 


T 


I 

TABLE  1:  DATA  USED  IN  THE  THREE  MEMORY  SEARCH  TASKS 


STATES  OCCUPATIONS  MAJORS 


1. 

Beef 

Hanmer 

Psychology  I 

2. 

Fish 

Drill 

U.S.  History 

3. 

Aerospace  Industry 

Saw 

Industrial  Psychology 

4. 

Citrus  Fruit 

Wrench 

Design/Measurement 
of  Work 

5. 

Tourists 

Pipe  Threader 

Personnel  Management 

6. 

Cypress  Trees 

Blow  Torch 

The  Behavior  of 
Organizations 
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i . 3 Procedure 

3.3.1  Instructions.  Upon  entering  the  lab,  subjects  were  seated  in  front 
of  an  intelligent  graphics  terminal  which  presented  all  instructions  and 
stimuli,  and  also  recorded  all  responses.  Once  the  subject  was  given 
informed-consent  information,  instructions  on  how  to  use  the  terminal  key- 
board and  to  correct  typing  errors  were  presented.  Then  two  demonstration 
problems  were  presented.  These  problems  consisted  of  giving  three  traits 
and/or  characteristics  of  animals,  and  naming  an  animal  which  was  con- 
sistent with  all  traits.  After  the  two  demonstration  problems,  the  subject 
wa3  given  a similar  practice  hypothesis  generation  problem  where  his  task 
was  to  generate  "animal"  hypotheses.  Subjects  were  given  60  seconds  to 
typr  in  their  hypotheses.  Typing  time  was  not  included  in  the  60  second 
interval.  Once  the  "return"  key  was  pushed  to  enter  the  hypothesis,  it 
disappeared  from  the  sc  een.  This  procedure  was  used  to  reduce  the  possi- 
bility that  subjects  would  generate  hypotheses  which  were  associates  of 
previously-generated  hypotheses. 

3-3.2  Data  Collection.  After  the  practice  problem  was  completed,  subjects 
were  given  the  first  experimental  problem.  The  data  were  presented  (i.e. 
products  of  States,  tools,  or  classes)  and  subjects  were  told  to  type  in 
any  hypotheses  which  came  to  mind  that  were  relevant  to  the  data.  They 
were  also  told  that  their  responses  should  be  based  upon  all  the  data 
presented  on  one  trial.  Nine  trials  were  always  presented  to  each  subject 
in  the  experimental  phase.  One  six-data  trial  involved  generating  hypo- 
theses in  response  to  all  six  data  of  one  of  the  tasks.  Two  three-data 
trials  involved  generating  hypotheses  to  the  first  three  data  and  the  last 
three  data  of  another  task.  There  were  six  single-datum  trials  which 
involved  generating  hypotheses  for  each  of  the  individual  datum  of  the 
remaining  task.  The  order  of  presentation  of  tasks  was  counterbalanced, 
and  each  task  was  presented  as  either  one,  two,  or  six  trials  for  each 
subject.  The  order  of  the  data  within  each  multiple-data  trial  was  random- 
ized for  each  subject.  The  order  of  the  single-datum  trials  was  also 
randomized.  A 60  second  interval  was  given  for  subjects  to  generate 
hypotheses,  just  as  in  the  practice  problem.  The  clock  which  measured  the 
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60  second  interval  was  stopped  when  the  subject  typed  the  first  letter 
of  each  hypothesis,  and  was  restarted  when  the  carriage  return  key  was 
pressed.  This  procedure  was  used  to  subtract  typing  time  from  memory 
search  time  because  of  the  large  individual  differences  among  the  sub- 
jects in  typing  speed.  Under  these  conditions  60  seconds  was  ample  time 
to  respond;  subjects  almost  always  ceased  hypothesis  generation  before 
the  end  of  the  60  second  interval. 
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; . 'I  RESULTS  fl^JD  DISCUSSION 

3.4.1  The  calculations  of  the  predictions  of  the  model.  The  data  of  this 
experiment  were  the  hypotheses  retrieved  from  memory  for  each  of  the  three 
tasks  for  either  one,  three,  or  six  data.  Each  of  the  nine  problems  was 
attempted  by  48  subjects.  These  results  were  used  to  calculate  the  prob- 
ability of  retrieval  of  various  hypotheses  in  the  various  retrieval  cond- 
itions. Thus,  for  a hypothesis  of  interest,  nine  retrieval  probabilities 
could  be  estimated.  These  were  six  one-data,  two  three-data,  and  one  six-data 
etrieval  probabilities. 

These  retrieval  probabilities  were  then  used  as  inputs  of  the  memory-tagging 
model;  the  single-datum  retrieval  probabilities  were  used  to  predict  the 
three-data  and  six-data  retrieval  probabilities  and  to  estimate  C,  the 
average  criterion  for  both  multi-data  conditions. 

A more  formal  derivation  of  the  predictions  of  the  model  and  two  proofs  that 
the  estimation  of  the  average  criterion  is  unique  are  presented  in  appendix  A. 
Here  a parallel  explanation  is  provided  at  the  conceptual  level  because  these 
idea,  although  simple,  require  complicated  and  difficult  notation  if  pres- 
ented formally.  The  basic  data  of  this  study  are  probabilities  of  generating 
any  particular  hypothesis,  P(X>C).  As  mentioned  previously  X is  the  number  of 
tags  for  any  given  hypothesis  and  C is  the  criterion  employed  on  that  trial. 
Therefore,  both  X and  C are  random  variables: 

X — 0 , 1, . . . i. . . , I 

and 

C - 1 ,2 , . . . i. . . , I 

The  probability  of  generating  a hypothesis  is  equal  to 
I 

*0  p(x>c)  = 1^’1  p(c=inx>i). 


In  words,  a hypothesis  will  be  generated  when  the  number  of  tags,  X,  is 
greater  or  equal  to  the  response  criterion,  C,  employed  on  that  trial.  It  is 
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assumed  that  the  criterion  is  independent  of  the  number  of  tags,  which  gives 
the  following  expression: 

I 

5)  P(X>C)  = ? P(Csi)P(X>i). 

The  model  is  now  written  in  terms  of  two  independent  processes,  the  response 
criterion  process  and  the  memory  tagging  process. 

First  to  be  discussed  is  the  memory  tagging  process.  It  will  be  recalled  that 
one  condition  of  the  experiment  collected  single-datum  retrieval  proba- 
bilities. In  the  single-datum  case  the  number  of  tags,  X,  can  either  be  0 or 
1 . Furthermore,  the  only  reasonable  criterion  that  the  subject  can  employ  is 
Cr 1 . Consequently,  if  subjects  tag  a hypotheses  they  retrieve  it  with  prob- 
ability 1.0.  So  we  can  write  for  this  single-datura  case  P(X^C)=Pg(X=1 ) , where 
3 stands  for  a single-datum  task. 

The  next  step  is  to  calculate  the  multi-data  retrieval  probabilities,  Pm(X>i) 
where  m stands  for  a multi -data  probability,  from  the  single-datum  prob- 
abilities, P (X=1).  This  can  be  done  by  assuming  that  the  probability  of 
tagging  a hypothesis  in  a multi -data  retrieval  situation  for  each  datum  is 
the  same  as  the  probability  of  tagging  that  same  hypothesis  for  the  same  datum 
in  a single-datum  situation. 

This  assumption  probably  is  at  best  only  approximately  true.  We  make  it  for 
mathematical  tractability . It  seems  most  plausible  if  memory-tagging  is 
assumed  to  be  a counting  process,  where  the  subject  recalls  the  number  of 
times  that  that  hypothesis  had  been  encountered  in  previous  searches  of 
memory. 

Given  this  assumption,  the  calculation  of  P^CXH)  is  straight-forward. 

Consider  a three-data  retrieval  condition  from  the  States  task  where  the  data 
were  1)  Citrus  Fruit,  2)  Tourists,  and  3)  Cypress  Trees  and  the  hypothesis  was 
California.  The  possible  outcomes  of  this  three-datum  retrieval  task  are 
shown  in  figure  M in  the  form  of  a tree  diagram.  The  probabilities  shown  on 
the  tree  branches  are  the  actual  single-datum  retrieval  probabilities  Pg(X=1) 

I 
I I 
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and  Fs(X=0).  Memory  tagging  is  denoted  by  X and  not  tagging  for  that  datum  is 
denoted  by  X. 


INSERT  FIGURE  4 ABOUT  HERE 


The  path  probabilities  and  the  number  of  tags  resulting  are  shown  to  the  right 

of  the  tree.  Shown  below  the  tree  are  the  calculations  of  P (X>i)  which 

m - 

completes  the  calculation  of  the  memory  tagging  probabilities.  The  calcul- 
ations of  P (X>i)  for  the  six  data  case  follow  the  same  proceedure,  except  that 
m — 

the  tree  has  six  data. 

The  alculation  of  P(C=i)  depends  on  P,  the  binomial  generating  probability. 
An  expression  for  P(C=1)  is  written  > s a function  of  P and  the  number  of  data 
in  the  task, I: 

6)  P(C=i)=(*“})  Pi_1  (1-P)I_i_1 

We  estimated  P for  each  hypothesis  in  all  three  and  six  multi-data  tasks  by 
means  of  a computer  search  algorithm  which  found  the  value  of  P such  that  the 
P(X>C)  predicted  by  the  model  is  equal  to  the  obtained  P(Xj>C).  Once  the 
generating  value  of  P waa  obtained  for  each  hypothesis,  the  average  value  of 
the  binomial  generating  probability  for  each  task,  P,  was  calculated.  The 
value  of  P was  then  entered  into  an  expression  for  C,  the  average  criterion 
value: 

7)  C = (I-DP+1. 

The  estimate  of  C is  the  desired  result  of  the  experiment,  and  is  the  average 
criterion  number  of  memory  tags  used  by  the  subject  in  the  various  conditions 
of  the  experiment. 


1 


I 
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If  the  model  Is  to  be  used  as  a measuring  device  to  locate  the  subjects  along 
the  continiuum  of  the  criterion  number  of  tags,  it  is  neccessary  to  evaluate 
the  goodness  of  fit  of  the  model.  This  can  be  done  by  using  the  model  and 
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Figure  A.  The  calculation  of  P(X>i)  from  the  single  - datum 


retrieval  probabilities. 
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|<•3tlmating  fJ(I)(X_>C)  for  each  hypothesis  in  the  task  using  the  P as  the  binomial 
generating  probability.  (This  is  equivalent  to  using  C and  gives  the  same 
result.)  Here  rather  than  estimating  P by  use  of  the  model,  P(X>C)  is 
".sti mated  from  the  model  and  P. 

I if  the  model  fits  the  data  well  we  would  expect  to  find  a high  linear  relation 

between  the  predicted  and  the  obtained  P(X^C)  values,  and  the  slope  of  the 
j best-fitting  line  between  the  obtained  and  predicted  P(X>C)  values  should  be 

* 1.0. 

Table  2 gives  the  C estimates  which  were  made  for  all  hypotheses  that  had  non- 

Izero  P (X>C)  and  P (X>C)  retrieval  probabilities  for  all  trials.  The  number 
s — m — 

of  hypotheses  meeting  this  criteria  is  shown  for  each  condition.  A list  of 
these  hypotheses  and  their  predicted  arid  obtained  retrieval  probabilities  is 
given  in  appendix  B.  Also  shown  in  table  2 are  various  goodness-of-f it 
i ndioe3. 


The  results  of  primary  interest  are  the  C estimates.  A3  can  be  3een  in  table  2 
the  C estimates  for  the  six  three-data  problems  range  between  1 . 89  and  2.23. 
The  average  value  across  all  three-data  problems  of  C is  2.01.  The  C esti- 
mates for  the  six-data  problems  are  higher,  ranging  from  2.42  to  3.20,  and  are 
on  the  average  equal  to  2.88. 


INSERT  TABLE  2 ABOUT  HERE 


The  yardstick  that  the  model  affords  suggests  that  about  2.0  memory  tags  are 
necessary  for  hypothesis  generation  in  the  three-data  tasks,  while  about  2.9 
tags  are  necessary  in  the  six-data  tasks.  The  rate  of  increase  of  C as  a 
function  of  the  number  of  data  in  the  task  in  Interesting.  If  we  assume  that 
C=1  for  one-datum  tasks,  then  we  have  C=1,  Cs2.01,  and  C=2.88  for  the  one, 
three,  and  six-data  tasks,  respectively.  The  value  of  C used  Increases  much 
more  slowly  than  the  number  of  data. 

The  major  conclusion  supported  by  these  results  is  that  hypotheses  are  gene- 
rate^ in  multi-data  retrieval  tasks  when  they  are  tagged  by  two  or  three  data. 

I 


TABLE  2 


ESTIMATION  OF  THE  CRITERION  NUMBER  OF  MEMORY  TAGS,  5,  AND 
GOODNESS-OF-FIT  INDICES 


3 DATA  TASKS 

HYPOTHESES 

C 

States 

15 

1.96 

States 

14 

1.92 

Majors  d^d^ 

20 

2.23 

Majors  d^-d^ 

16 

1.89 

Occupations  d^d^ 

17 

1.98 

Occupations  d^-d^ 

14 

2. 10 

MEANS: 

2.01 

6 DATA  TASKS 


States  d,-d, 

1 0 

9 

3.20 

Majors  d^d^ 

12 

3.04 

Occupations  d.-d,- 
1 0 

8 

2.42 

2.88 


UPPER  AND  LOWER  95% 
CONFIDENCE  INTERVALS 


r 

b 

FOR  b. 
LOWER 

UPPER 

.935 

1.144 

.885 

1.403 

.962 

1.997 

.818 

1.176 

.887 

1 - 352* 

1.004 

1.701 

.892 

1.459* 

1.035 

1.882 

.947 

1.099 

.894 

1.305 

.872 

1.138 

.735 

1.540 

.923 

1.198 

*P  < 

.05 

.987 

.9540 

.821 

1.087 

.923 

1.183 

.835 

1.531 

.874 

1.030 

.457 

1.603 

.949 

1.056 

MEANS: 
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This  result  has  located  human  performance  along  a continluum  of  possible 
models.  Clearl  ' the  two  limiting-case  models  discussed  previously  can  be 
rejected  on  the  basis  of  these  results.  Any  future  hypothesis  generation 
models  that  are  developed  should  have  the  characteristic  that  hypotheses  are 
retrieved  from  part,  but  not  all  of  the  data. 

3.4.?  The  goodness  of  fit  of  the  model.  We  have  also  developed  a model  which, 
despite  its  simplicity,  fits  the  data  well.  The  coefficients  of  correlation, 
r,  between  the  obtained  P(XXl)and  the  predicted  model  are  shown  in  the  middle 
of  table  2. 

The  model  fits  the  data  about  as  well  as  could  be  expected.  The  mean  corr- 
elat  >ns  are  .923  and  .949  calculated  from  the  Fisher  r to  z transform  for 
the  three-data  and  six  data  tasks  -’espectively . The  P^'XjC)  probabilities  are 
unreliable  because  they  are  estimated  from  48  trials.  For  48  trials  with 
P=.5,  for  example,  the  95?  confidence  interval  of  a binomial  probability  is  + 
.139.  A calculation  was  performed  to  investigate  the  impact  of  the  unreli- 
ability of  the  P (XX))  probability  estimates.  The  variance  of  each  P (X>C) 
m — m — 

estimate  was  calculated  for  each  problem  for  a binomial  process  where 
P^P^XXl),  N=48.  The  mean  of  these  problem  variances  was  .00230,  which  is  an 
estimate  of  how  well  a perfect  model  would  do  if  the  only  source  of  error  was 
binomial  variation.  The  variance  between  the  model's  P(X>C)  and  the  obtained 
P(X^C)  contains  both  errors  of  prediction  of  the  model,  and  binomia1  variance. 
Th<  . variance  estimates  are  .00828  for  the  three-data  problems,  and  .00486 
for  the  six  data  problem.  This  analysis  shows  that  between  roughly  25?  to  50? 
of  the  unexplained  variance  is  due  to  variation  in  the  criterion,  and  hence  is 
in'  '■•insic  variability.  These  analyses  show  that  the  model  fits  about  as  well 
a3  can  be  expected,  and  that  it  predicts  nearly  all  the  variance  that  can  be 
accounted  for  by  any  model. 

We  also  calculated  the  slope,  b,  of  the  best-fitting  line  between  observed 
P(X^C)  and  the  predicted  P(X^C).  These  slopes  are  shown  in  table  2 with  the 
upper  and  lower  boundaries  of  the  95?  confidences  interval  for  these  slopes. 
.Seven  of  the  nine  slopes  include  1.00  in  their  95?  confidence  interval. 
However,  there  appears  to  be  a tendency  for  the  model  to  underestimate  low 
P(X,1C)  and  to  overestimate  high  P(X>C)  values. 
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In  summary,  the  model  is  adequate  for  the  measurement  purposes  for  which  it 
was  designed,  and  fits  the  data  considerably  better  than  we  expected.  It 
allows  us  to  estimate  that  hypothesis  generation  requires  two  memory  tags  in 
the  three-data  case  and  2.9  tags  in  the  six  data  case. 

3.^.3  The  minimally-adequate  hypothesis  set.  One  question  of  considerable 
theoretical  and  applied  significance  that  is  still  to  be  addressed  is  the 
adequacy  of  the  subjects'  performance.  Their  task  was  to  generate  hypotheses. 

How  well  did  they  do  at  the  hypothesis  generation  tasks?  We  have  invented  the 
concept  of  a "minimally-adequate  hypothesis  set"  to  characterize  the  adequacy 
of  their  performance.  A minimally-adequate  hypothesis  set  is  defined  as  a set 
of  hypotheses  that  is  quite  likely  to  contain  the  correct  hypotheses.  The 
phrase  "quite  likely"  can  be  defined  as  the  user  wishes.  For  example  it  can 
be  defined  as  a probability  of  .95  that  the  current  hypothesis  set  contains 
the  generating  hypothesis.  In  this  case,  a minimally-adequate  hypothesis  set 
is  defined  as  a hypothesis  set  having  a plausibility  of  .95  (see  equation  3 in 
section  2.3).  Alternatively  the  minimally-adequate  hypothesis  set  can  be 
defined  by  knowledgeable  experts  as  the  minimum  3et  that  should  be  considered 
given  the  data.  Therefore,  the  definition  of  the  hypotheses  which  should  be 
contained  in  thi3  set  can  either  be  defined  by  Bayesian  techniques,  by 
knowledgeable  experts,  or  by  any  combination  of  the  two. 

The  minimally-adequate  hypothesis  set  can  be  useful  in  characterizing  the 
adequacy  of  the  subjects'  performance.  Good  performance  would  be  marked  by 
all  or  most  subjects  achieving  a minimally-adequate  hypothesis  set. 

We  have  established  minimally-adequate  hypothesis  3ets  for  the  three  six-data 
tasks.  Because  we  were  acting  as  knowledgeable  experts,  we  deliberately  chose 
these  sets  using  a conservative  criteria  which  was  biased  in  favor  of  our 
subjects.  There  were  three  hypotheses  in  each  set  and  these  choices  were 
based  on  additional  library  research  on  the  tasks,  and  our  initial  knowledge 
when  we  designed  the  tasks.  We  compared  the  six-data  hypothesis  3ets  gene- 
rated by  the  subjects  to  the  minimally-adequate  hypothesis  set.  We  then 
counted  the  numbers  of  hypotheses  common  to  both  sets,  and  converted  these 
numbers  to  cumulative  percentages.  Table  3 shows  these  results  and  the 
minimally-adequate  hypothesis  sets. 

' 

m 
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INSERT  TABLE  3 HERE 


As  can  be  seen  from  an  Inspection  of  table  3,  the  percentage  of  subjects  who 
achieved  what  might  be  charitably  called  "minimally-adequate"  performance 
ranged  between  0.0%  and  36 . 5% - Even  when  the  criterion  of  performance  is 
relaxed  still  further  to  two  of  the  three  hypotheses,  the  subjects  perform 
poorly  using  this  extremely  liberal  criterion  of  adaquate  performance!  If 
these  tasks  were  particularly  difficult,  or  required  particular  expertise, 
perhaps  this  level  of  performance  would  be  satisfactory,  but  these  tasks  were 
deliberately  chosen  because  their  content  should  be  familiar  to  our  subjects. 

These  results,  if  they  can  be  generalized  to  experts  who  are  working  with 
their  specialities,  support  our  contention  that  decision-aiding  in  hypothesis 
generation  should  be  explored  as  a possible  means  of  increasing  the  proba- 
bility that  the  subject  attains  a minimally-adequate  hypothesis  set. 


TABLE  3 


THE  CUMULATIVE  PERCENTAGE  OF  SUBJECTS  WHOSE  HYPOTHESIS  SETS 
INCLUDED  EITHER  THREE,  TWO  or  ONE  HYPOTHESES  FROM  THE 
MINIMALLY- AD EQUATE  HYPOTHESIS  SET 


TASK 


MINIMALLY- AD EQUATE 
HYPOTHESIS  SET 


PERCENT  OF  SUBJECTS 
RESPONDING  WITH  AT  LEAST 

3 2 1 


STATES  CALIFORNIA  36.5  66.3  100 

FLORIDA 

TEXAS 


OCCUPATIONS  PLUMBER  4. 3 47.8  91. 3 

ELECTRICIAN 
OILFIELD  WORKER 


MAJORS  PSYCHOLOGY  0.0  36.0  84.0 

MANAGEMENT 

INDUSTRIAL 

ENGINEERING 
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M.O  THE  VERIDICAL  PLAUSIBILITY  STUDY 


. 1 Purpose 

Gettys  and  Fisher  (in  preparation)  demonstrated  that  the  subjective  plaus- 
ibility of  new  hypotheses  influences  whether  or  not  these  hypotheses  will 
be  employed  in  an  inference  task,  and  that  the  plausibility  of  the  current 
hypothesis  set  partially  controls  the  memory  search  process.  However, 
this  study  did  not  examine  the  accuracy  of  the  subjective  plausibility 
reports.  The  basic  strategy  of  the  veridical  plausibility  study  was  to 
obtain  plausibility  judgments  in  a situation  where  veridical  plausi- 
bilities could  be  estimated. 

The  assessment  of  the  accuracy  of  plausibility  estimates  is  quite  im- 
portant in  determining  the  most  effective  location  of  decision-aiding  ef- 
forts. The  memory  retrieval  experiment  discussed  previously  showed  that 
subjects  are  poor  at  retrieving  minimally-adequate  hypothesis  sets;  thus 
making  hypothesis  generation  an  obvious  target  for  decision  aiding.  In  the 
veridical  plausibility  study  a similar  assessment  of  the  plausibility  es- 
timation process  is  made  to  gain  some  understanding  of  human  capabilities 
in  this  aspect  of  hypothesis  generation. 

. 2 Method 

*t.2.1  An  Overview  of  the  Method.  The  design  of  this  study  is  based  on  the 
plausibility  assessment  model  presented  previously  in  section  2.3.  This 
w '"l  proposes  that  there  are  two  plausibility  assessment  processes,  one 
which  evaluates  individual  hypotheses  and  a second  that  evaluates  the 
entire  hypothesis  set.  Subjects  in  this  study  made  estimates  correspond- 
ing to  the  two  types  of  plausibility  assessments. 

A second  independent  variable  in  this  study  was  the  plausibility  of  the 
entire  hypothesis  set.  The  plausibility  of  the  hypothesis  set  was  system- 
atically varied  over  three  values:  low,  medium,  or  high. 
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A third  independent  variable  was  the  number  of  data  in  each  judgment  task. 
Either  one,  three,  or  six  data  were  employed  in  the  various  tasks.  This 
variable  was  manipulated  to  determine  if  number  of  data  has  an  effect  on 
plausibility  judgments,  as  it  has  on  Bayesian  inference  tasks  (Slovic  and 
Lichtenstein,  1971). 

Finally,  we  manipulated  whether  or  not  the  subjects  knew  the  prior  odds  of 
the  hypotheses.  It  became  obvious  early  in  the  rather  extensive  series  of 
pilot  studies  that  plausibility  estimates  are  considerably  different  than 
typical  odds  estimates,  and  we  decided  to  determine  if  perhaps  this  effect 
was  due  to  gross  mis-estimation  of  the  prior  odds.  The  design  employed  1) 
three  levels  of  plausibility  of  the  current  hypothesis  set  and  2)  three 
levels  of  the  number  of  data  as  completely-crossed  factorial  within-sub- 
ject  variables.  The  presence  or  absence  of  prior  odds  was  a between- 
subjects  variable. 

4.2.2  Subjects.  Subjects  were  introductory  psychology  students.  All 
subjects  were  males;  16  were  randomly  assigned  to  the  "prior3"  condition 
and  an  equal  number  to  the  "no  priors"  condition. 

4.2.3  Apparatus.  Part  of  the  instructions  and  the  data  collection  in  the 
actual  study  were  under  the  control  of  a intelligent  graphics  terminal,  a 
Compucolor  model  8051,  manufactured  by  the  Intelligent  Systems  Corpora- 
tion, Norcross,  GA.  The  computer  has  color  graphics  which  can  be  con- 
trolled by  a light  pen. 

4.2.4  Estimation  of  veridical  plausibilities.  The  basic  task  of  this 
study  was  for  subjects  to  estimate  the  plausibility  of  various  hypothe- 
sized majors  given  some  information  on  the  classes  an  unknown  under- 
graduate student  had  taken.  This  task  was  chosen  because  it  is  possible  to 
obtain  actual  frequency  counts  from  enrollment  records.  The  data  used  were 
for  all  non-transfer  undergraduate  students  enrolled  at  the  University  of 
Oklahoma  in  the  Fall  of  1977.  The  data  base  consisted  of  116,875  enroll- 
ment records,  where  each  record  was  of  a class  taken  by  a student. 
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Software  was  developed  to  determine  from  the  enrollment  records  the  fre- 
quency of  majors  of  students  who  had  taken  either  one,  three,  or  six 
specified  classes.  The  frequencies  of  these  majors  were  used  as  relative 
frequency  estimates  of  the  posterior  probabilities.  This  technique  was 
employed  so  that  assumptions  of  conditional  independence  were  unnecessary 
to  evaluate  the  posterior  probabilities.  The  problems  chosen  by  this 
technique  are  shown  in  appendix  C. 

This  particular  inference  task  was  chosen  because  it  seemed  well  suited  to 
the  requirements  of  the  plausibility  estimation  study.  Those  requirements 
were  1)  that  the  task  possess  a large  number  of  possible  hypotheses,  2) 
that  the  task  allow  the  estimation  of  veridical  plausibilities  with  a 
minimum  of  assumptions,  and  3)  that  the  relationships  between  data  and 
hypotheses  be  intutively  understandable  to  the  subjects. 

This  task  met  these  three  criteria,  but  it  should  be  emphasized  that  the 
task  was  quite  difficult  for  the  subjects.  The  exact  relationships  between 
classes  and  majors  is  far  from  obvious;  a modern  university  has  multiple 
requirements  for  graduation  both  at  the  Departmental,  College  and  Uni- 
versity level.  A student  therefore  chooses  his  program  of  courses  partial- 
ly on  the  basis  of  requirements,  partially  on  the  basis  of  his  interests, 
and  partially  on  the  basis  of  his  career  goals.  While  the  students  of  a 
university  should  have  a good  intutive  understanding  of  these  variables  in 
a general  sense,  no  one  person  is  privy  to  all  of  this  information. 

4.2.5  Problems.  Eighteen  problems  having  one,  three  or  six  data  were 
presented  in  random  order;  each  contained  three  specified  hypotheses  about 
the  possible  major  of  the  unknown  undergraduate  student  and  a fourth  "catch- 
all" hypothesis.  The  plausibility  of  the  "catch-all"  for  a third  of  the 
problems  was  in  the  range  of  0 to  33%,  a third  in  34  to  66%  and  a third  in 
67  to  100?f.  By  so  doing,  the  plausibility  of  the  "catch-all"  was  crossed 
with  number  of  data,  with  two  problems  nested  in  each  cell. 

Subjects  assessed  plausibilities  using  a magnitude  estimation  procedure; 
responses  may  be  interpreted  as  posterior  odds  estimates. 
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4.2.6  Instructions.  The  "instructions"  phase  of  the  session  lasted  for 
about  20  minutes  and  was  computer-assisted.  The  entire  session  lasted 
about  an  hour,  although  there  was  some  variation  since  the  study  was 
subject-paced. 

The  instructions  consisted  of  a graduated  series  of  four  tasks  designed  to 
familiarize  the  subjects  with  the  procedures  necessary  to  undertake  the 
experimental  task.  The  first  task  was  practice  in  using  a light  pen  in 
conjunction  with  the  display.  The  second  task  introduced  the  magnitude 
estimation  procedure  to  subjects.  In  this  task  the  length  of  line3  was 
adjusted  with  the  light  pen  to  estimate  the  relative  areas  of  four  rec- 
tangles. The  third  task  was  practice  in  magnitude  estimation  on  a prob- 
abilistic inference  task  which  involved  predicting  election  outcomes.  The 
last  instructional  task  was  a practice  problem  similar  to  those  used  in  the 
actual  experiment.  Immediately  following  the  instructional  session,  the 
data-colleotion  phase  began. 

4.2.7  The  Data  Collection  Procedure.  Each  of  the  18  problems  were  pre- 
sented in  two  frames  on  the  computer  screen.  (A  frame  on  the  computer 
screen  had  a blue  background  and  was  rectangular;  30.8  cm  wide  by  27.5  cm 
high. ) 


Frame  One.  In  frame  one  of  each  trial,  the  subjects  evaluated  the 
three  specifically-named  hypotheses  separately  and  also  evaluated  the 
"catch-all"  hypothesis.  This  task  differed  little  from  a typical  Bayesian 
inference  task  except  that  the  catch-all  estimate  was  employed. 

Figure  5 is  a color  Xerox  reproduction  of  a photograph  of  this  display.  It 
included  four  principle  components.  At  the  top  there  was  a brief  synopsis 
of  the  instructions  as  a constant  reminder  to  the  subject.  Data  and 
hypothesis  areas  displayed  the  data  and  hypotheses,  respectively.  In  the 
center  right  of  the  display  were  four  horizontal  lines  which  the  subjects 
adjusted  using  the  light  pen  to  make  their  plausibility  estimates.  When 
the  light  pen  was  aimed  at  a point  on  a line  and  was  triggered,  the  segment 
left  of  the  pen  turned  red.  The  subject  was  allowed  to  interactively 
adjust  these  lines  until  he  was  satisfied  with  his  plausibility  responses. 


Figure  5.  Frame  one  of  the  display  as  it  was  seen  by  the 
"Priors'"  subjects. 
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Also  shown  in  Figure  5 is  the  method  of  indicating  prior  odds.  The  small 
red  "tick  mark  ;"  were  set  at  the  prior  odds  for  all  four  lines  in  the 
!>  priors  condition  only. 


INSERT  FIGURE  5 ABOUT  HERE 


- Frame  two.  The  second  frame  of  each  problem  pitted  the  collection  of 
specifically-named  hypotheses  against  the  catch-all,  so  that  subjects  es- 
timated the  relative  plausibilities  of  two  collections  of  hypotheses. 
Frame  two  contained  the  same  data  and  hypotheses  that  were  presented  in  the 
first  frame.  There  were  two  22.4  cm  lines,  having  the  same  function  as  the 
four  lines  of  the  first  frame.  The  three  specifically-named  hypotheses  of 
the  first  frame  were  printed  over  the  top  of  the  upper  line  and  the  phrase 
"all  other  majors"  was  printed  over  the  bottom  line. 

Subjects  adjusted  the  lengths  of  these  two  lines  to  estimate  the  plaus- 
ibilities of  the  two  collections  of  hypotheses.  The  two  frames  were 
essentially  identical  in  other  details  and  red  "tick  marks"  were  also  used 
to  indicate  the  prior  odds  for  those  subjects  in  the  "priors"  condition. 
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*1.3  Results  and  Discussion 
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4.3.1  Plausibility  Assessment  of  Hypothesis  Sets.  A number  of  analyses 
have  been  performed  on  these  data  to  examine  various  major  and  minor 
questions  of  interest.  One  question  of  central  interest  is  how  well 
subjects  can  estimate  the  plausibility  of  collections  of  hypotheses.  An 
analysis  of  variance  was  performed  to  examine  this  question  on  the  pos- 
terior log  odds  of  the  subjects’  responses.  For  the  frame  one  data,  pos- 
terior log  odds  were  calculated  from  equation  3 by  making  the  appropriate 
transformations.  For  frame  two,  where  subjects  were  estimating  the  plaus- 
ibility of  the  collection  of  hypotheses  directly,  only  a log  transform- 
ation was  necessary.  All  odds  estimates  were  written  in  favor  of  the 
specified  hypotheses.  The  results  of  the  analysis  of  variance  showed  that 
the  main  effect  of  pages,  plausibility,  and  number  of  data  were  significant 
at  the  p < .001  level.  The  effect  due  to  prior  probabilities  wa3  non- 
significant. Also  the  number  of  data  by  plausibility  interaction,  and  the 
four-way  interaction  involving  the  above  variables  and  pages  of  the  dis- 
play were  significant  at  the  p < .05  level.  These  results  will  be  dis- 
cussed in  turn. 

On  frame  one,  subjects  indirectly  rated  the  collection  of  specified  hypo- 
theses as  being  noticably  more  likely  than  these  same  hypotheses  were  rated 
on  frame  two  (F  = 38.17;  df  = 1,30;  p < .001).  The  mean  geometric  odds 
judgments  for  frame  one  was  2.99  and  was  1.956  for  frame  two.  This  signi- 
ficant difference  in  average  log  odds  estimates  is  attributable  to  the  dif- 
ference between  calculating  the  plausibility  of  the  current  hypothesis  set 
from  odds  estimates  on  frame  one  and  the  direct  estimates  of  frame  two.  If 
the  indirect  frame  one  estimates  were  perfectly  consistent  with  the  direct 
frame  two  estimates  then  the  means  for  both  frames  should  be  equal.  The 
significant  difference  between  the  means  therefore  indicates  that  the  sub- 
jects were  somewhat  inconsistent  in  the  two  types  of  estimates.  This 
effect  was  anticipated,  and  was  the  motivation  for  obtaining  the  frame  two 
estimates.  Equation  3,  which  was  used  to  aggregate  the  frame  one  es- 
timates, is  based  on  the  union  of  mutually  exclusive  events,  and  this 
result  shows  that  such  a 3um  typically  exceeds  a direct  estimate  of  the 
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SUBJECTIVE  PLAUSIBILITY 
(SUBJECT'S  LOG  ODDS) 


I 


VERDICAL  PLAUSIBILITY 
(BAYES  LOG  ODDS) 


Figure  6.  Subjective  plausibility  estimates  as  a function  of 


veridical  plausibility  for  either  one,  three  or  six  data. 
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same  union.  Evidently  the  combination  rule  used  by  the  subjects  is  not 
perfectly  isomorphic  with  equation  3. 

The  effects  due  to  number  of  data  (F  = 15.6;  df  = 2.60;  p < .001)  and 
diagnosticity  (F  = 18.74;  df  = 2,60;  p < .01)  were  significant,  as  was  the 
number  of  data  by  diagnosticity  interaction  (F  = 3*51;  df  = 4,120;  p < 
.01).  Figure  6 shows  these  results  a3  a plot  of  the  subjective  plausi- 
bilities in  log  odds  form  versus  the  veridical  plausibilities  in  log  odds 
form.  All  odds  ratios  are  written  in  favor  of  the  specified  hypothesis  set 
so  log  values  that  are  greater  than  1.0  mean  that  the  data  favor  the 
specified  hypothesis  set,  while  negative  log  values  mean  that  the  data 
favor  the  catch-all  hypothesis  set. 


INSERT  FIGURE  6 ABOUT  HERE 


The  general  form  of  these  functions  are  remarkably  the  same  in  all  levels 
of  priors  and  no  priors  and  frame  one  and  frame  two  estimates.  Perhaps  the 
most  striking  aspects  of  these  results  are  that  the  specified  hypothesis 
set  is  judged  more  likely  than  the  catch-all  set  in  the  vast  majority  of 
cases.  Each  point  on  this  graph  is  the  mean  response  to  two  problems. 
When  the  mean  plausibility  responses  to  the  18  problems  were  individually 
examined,  the  specified  hypotheses  are  favored  by  16  of  the  18  mean  re- 
sponses. A subject  who  responds  in  a veridical  fashion  should  favor  the 
specified  hypotheses  for  9 of  the  18  problems.  Although  the  data  strongly 
support  the  catch-all  hypothesis  set,  the  subjects  usually  3till  evaluate 
the  specified  hypothesis  set  as  the  more  likely.  Furthermore,  the  magni- 
tude of  their  responses  as  compared  to  those  of  veridical  subject  suggests 
ti.it  they  have  difficulties  in  evaluating  the  plausibility  of  hypothesis 
sets. 


Our  first  reaction  to  these  results  was  surprise.  However,  when  con- 
sidering these  results  in  combination  with  the  results  of  the  memory  search 
experiment  an  explanation  emerges. 
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First,  assume  that  the  estimation  of  both  the  plausibility  of  single  hypo- 
theses, and  of  sets  of  hypotheses  is  based  on  the  availability  of  in- 
formation in  memory  (Tver3ky  and  Kahneman,  1973).  Thi3  information  re- 
lates the  data  and  hypotheses  in  memory.  It  seems  logical  to  assume  that 
subjects  can  retrieve  this  information  in  the  case  of  specified  hypo- 
theses, since  both  data  and  hypotheses  are  available  to  act  as  retrieval 
cues.  However,  in  the  case  of  the  unspecified  hypotheses  in  the  catch-all 
hypothesis  set,  possible  hypotheses  mu3t  first  be  retrieved  from  memory 
before  the  plausibility  of  the  catch-all  set  can  be  evaluated.  The  de- 
ficiencies in  this  hypothesis  retrieval  process  were  demonstrated  in  the 
memory  search  experiment,  so  we  can  expect  that  the  catch-all  set  will  be 
drastically  underpopulated.  Consequently,  the  plausibility  of  the  catch- 
all set  relative  to  the  specified  hypothesis  set  will  be  underestimated 
because  not  enough  plausible  hypotheses  are  retrieved  from  memory  for  the 
catch-all  set.  When  the  plausibility  of  the  catch-all  set  is  under- 
estimated, the  specified  hypothesis  set  is  necessarily  overestimated  due 
to  the  ratio  form  of  the  odds  response.  The  results  are  consistent  with 
thi3  explanation. 

Difficulties  in  making  plausibility  estimates  seem  to  be  most  pronounced 
in  the  one-datum  condition.  Here  the  subject's  responses  are  signifi- 
cantly more  extreme  than  those  in  the  three  or  3ix-datum  conditions  (F  = 
32.1;  df  = 1 , 60 ; p < .001),  while  the  difference  between  the  multi-data 
conditions  is  non-significant.  This  effect  is  probably  due  to  the  con- 
straints we  operated  under  in  creating  the  multi-data  problems.  In  order 
to  maximize  the  number  of  students  who  had  taken  three  or  six  cla33es  to 
obtain  reliable  relative  frequency  estimates,  we  were  forced  to  U3e  pop- 
ular lower-level  introductory  courses  in  most  cases.  While  the  majority  of 
courses  in  the  multi-data  problems  were  lower-level,  the  majority  of 
courses  in  the  one-datum  problems  were  upper-level.  Evidently,  the  sub- 
jects tended  to  believe  that  the  upper-level  courses  were  more  diagnostic 
than  they  actually  were,  thus  producing  this  result. 


The  significant  4-way  interaction  was  plotted  and  studied  but  does  not  seem 
particularly  interesting.  It  accounts  for  only  5%  of  the  variance  in  the 
study.  The  fu  .ctions  relating  plausibility  and  number  of  data  are  quite 
similar  regardless  of  values  of  priors  and  frames,  and  the  minor  changes  that 
do  occur  do  not  change  the  general  interpretation  of  figure  6. 

4.3.2  Plausibility  assessment  of  individual  hypotheses.  A series  of  analyses 
has  been  performed  to  examine  how  well  subjects  estimated  the  plausibility  of 
individual  hypotheses.  In  many  way3  these  results  are  similar  to  those 
obtained  from  the  previously-discussed  plausibility  judgments  of  entire  hypo- 
thesis 3ets. 

The  first  analysis  simply  counted  the  number  of  conservative  and  excessive 
judgments  for  all  specified  hypothesis  and  all  catch-all  hypotheses.  This 
analysis  was  based  on  the  median  responses  to  each  problem.  As  there  were 
three  specified  hypotheses  in  each  problem  and  18  problems,  there  were  54 
median  judgments  of  specified  hypotheses,  and  18  judgments  of  catch-all  hypo- 
theses. There  results  are  presented  for  the  "priors"  and  "no-prior3" 
conditions  in  table  4. 


INSERT  TABLE  4 ABOUT  HERE 


The  majority  of  plausibility  estimates  of  specified  hypotheses  were  exces- 
sive, while  the  majority  of  the  plausibility  estimates  for  the  catch-all 
hypotheses  were  conservative.  This  result  is  what  would  be  expected  given  the 
previous  data  and  the  explanation  for  these  excessive  plausibility  judge- 
ments. 

We  also  converted  the  subject's  odd  estimates  to  plausibilities  by  normal- 
izing their  estimates  for  frame  one.  We  then  calculated  the  median  response. 
This  median  subjective  response  was  plotted  vs.  the  veridical  values  and  a 
line  of  best  fit  was  cal  .ulated,  as  is  often  done  in  a Bayesian  analysis.  The 
slope  of  the  line  of  be3t  fit  is  a measure  of  calibration  of  the  subject.  If 


TABLE  4 


PROPORTIONS  OF  EXCESSIVE  AND  CONSERVATIVE 
JUDGMENTS  IN  FRAME  ONE 


PRIORS  FURNISHED 

EXCESSIVE 

CONSERVATIVE 

N 

SPECIFIED  HYPOTHESES 

.722 

.278 

54 

CATCH-ALL 

. 167 

.833 

18 

NO  PRIORS 

SPECIFIED  HYPOTHESES 

.815 

.185 

54 

CATCH-ALL 

.167 

.833 

18 

the  slope,  b,  is  equal  to  1.0  the  subjects  are  well  calibrated  in  a Bayesian 
sense.  Slopes  greater  than  one  are  characterized  as  "excessive",  while  slopes 
les3  than  one  are  called  "conservative".  We  tested  the  null  hypothesis  that 
the  slopes  are  equal  to  zero,  and  also  calculated  a coefficient  of  correlation 
to  characterize  the  variability  or  "noise"  in  the  estimates.  The  results  of 
these  analyses  for  frame  one  are  displayed  in  Table  5.  One  difference  between 
these  results  and  the  results  previously  discussed  is  knowledge  of  the  prior 
probabilities.  Contrary  to  the  thrust  of  Kahneman  and  Tversky's  (1973) 
results,  these  subjects  do  appear  to  be  influenced  by  knowledge  of  prior 
probabilities  in  estimating  the  plausibilities  of  individual  hypotheses, 
while  the  effect  of  this  variable  on  their  plausibility  estimates  of  groups  of 
hypotheses  is  negligible.  The  greater  slopes  for  the  priors  condition  regres- 
sion lines  shown  in  Table  5 suggests  that  subjects  are  better  calibrated 
(i.e.,  less  conservative)  when  prior  probability  information  is  provided.  A 
second  powerful  effect  is  seen  in  the  differences  in  slope  between  specif- 
ically-named hypotheses  and  the  catch-all.  Specifically-named  hypotheses  are 
judged  much  more  plausible  than  the  catch-all.  In  the  "no  priors"  condition 
the  slopes  differ  by  a factor  of  three.  Graphically,  this  difference  is  even 
more  striking.  The  two  distributions  of  plotted  points  have  very  little 
overlap.  We  believe  that  this  effect  is  due  to  the  relatively  greater 
"availability"  (Tversky  and  Kahneman,  1973, 197*0  of  the  specifically-named 
hypotheses  vs  the  catch-all  hypothesis  as  previously  discussed.  Evidently, 
the  specifically-named  hypotheses  are  available  in  memory,  while  the  hypo- 
theses in  the  catch-all  are  not  readily  available. 

One  aspect  of  these  results  that  may  be  confusing  to  some  is  how  the  majority 
of  the  plausibility  estimates  can  be  excessive,  while  the  slope  of  the  best- 
fitting  line  is  less  than  one.  The  values  of  the  y-intercept  are  greater  than 
zero,  and  most  judgments  lie  above  the  positive  diagonal. 


INSERT  TABLE  5 ABOUT  HERE 


The  correlation  of  .3^7  for  the  catch-all  in  the  no  priors  condition  is  also 
of  Interest.  The  square  of  this  correlation  is  .127  which  is  the  proportion 
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TABLE  5 


RESULTS  OF  REGRESSION  AND  CORRELATIONAL  ANALYSES 
OF  FRAME  ONE  OF  THE  TASK 


PRIORS  CONDITION 

Line 

(y 

b 

of  Beat  Fit 
= bx  + a) 
a 

Test 
b = 
t 

That 

0 

df 

P 

Correlation 

SPECIFICALLY-NAMED 

HYPOTHESES 

.507 

.1276 

7.34 

52 

<.001 

.713 

CATCH-ALL  HYPOTHESIS 

.298 

.1437 

2. 10 

16 

<.05 

.465 

NO  PRIORS  CONDITION 

SPECIFICALLY-NAMED' 

HYPOTHESES 

.397 

.1800 

7.34 

52 

<.001 

.713 

CATCH-ALL  HYPOTHESIS 

.122 

.1844 

1.53 

16 

. 1>p>.05 

.357 

TABLE  6 


RESULTS  OF  REGRESSION  AND  CORRELATIONAL  ANALYSIS 
ON  FRAME  TWO  OF  THE  TASK 


PRIORS  CONDITION 

Line 

(y 

b 

of  Best  Fit 
= bx  + a) 
a 

Test 
b = 
t 

That 

0 

df 

p 

Correlation 

CATCH-ALL  HYPOTHESIS 

.14 12 

.1709 

2.95 

16 

<.005 

.594 

NO  PRIORS  CONDITION 

CATCH-ALL  HYPOTHESIS 

.2M 

.2251 

2.57 

16 

<.025 

.5*10 

NOTE:  As  there  are  only  two  Judgments  on  frame  two,  the  regression  slope,  the  t, 
and  the  correlation  are  the  same  for  the  group  of  specifically-named  hypotheses 
since  the  probability  of  the  specifically-named  hypotheses  is  necessarily  one 
minus  the  probability  of  the  catch-all. 
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TABLE  7 


ORDINAL  COMPARISONS  OF  THE  PRIORS-NO  PRIORS  CONDITIONS  MADE 
BY  COMPARING  THE  ORDINAL  PROPERTIES  OF  THE  SUBJECT'S 
PLAUSIBILITY  ESTIMATES  WITH  VERIDICAL  PLAUSIBILITIES 


PROPORTION  OF  TRIALS  WHERE:  PRIORS  NO  PRIORS 

1)  Correct  Rank  was  Assigned 
to  Most  Likely  Hypothesis 


Frame  1 : 

.526 

.492 

NS 

Frame  2: 

.613 

.631 

NS 

Correct  Rank  was  Assigned 
to  Least-Likely  Hypothesis 
(Frame  1) 

.471 

.462 

NS 

Correct  Rank  was  Assigned 
to  the  Catch-All  Hypothesis 
(Frame  1) 

.399 

.332 

NS 
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of  the  catch-all  variance  accounted  for  by  the  veridical  plausibilities.  In 
this  inference  .ask,  at  least,  there  is  little  accuracy  in  the  catch-all 
estimates  on  frame  one  judgments.  Gettys  and  Fisher  have  shown  that  hypo- 
thesis generation  is  controlled  by  subjective  plausibility,  but  evidently 
these  feelings  of  plausibility  for  the  catch-all  aren’t  very  accurate! 
Another  interpretation  of  these  results  is  that  in  the  frame  one  judgments  the 
subject  is  not  very  concerned  with  the  plausibility  of  the  catch-all,  but 
rather  tends  to  concentrate  on  the  specifically-named  hypotheses.  Martin  and 
Gettys  (1969)  have  reported  similar  results.  If  this  is  the  case,  then  the 
frame  two  catch-all  results  should  show  more  effect  due  to  the  veridical 
plausibilities  as  the  subject  is  confronted  directly  with  the  task  of  eval- 
uation o.  two  groups  of  hypotheses  one  of  which  is  the  catch-all.  Table  6 
presents  an  analysis  of  the  frame  2 plausibility  estimates.  An  inspection  of 
table  6 shows  the  same  effect  as  noted  previously  in  regard  to  the  priors-no 
priors  manipulation;  the  no  priors  group  is  considerably  less  well-calibrated 
or  more  conservative.  Here  the  veridical  plausibilities  account  for  29.2$  of 
the  variability  in  the  "no  priors"  estimates,  which  is  an  improvement  over 
frame  1,  but  is  still  far  from  excellent  performance. 


INSERT  TABLE  6 ABOUT  HERE 


4. 3. 3 Ordinal  properties  of  plausibility  assessments.  We  have  conducted  a 
series  of  analyses  on  the  ordinal  properites  of  the  data.  A good  case  could 
be  made  for  the  adequacy  of  the  unaided  human  if  he  can  rank-order  hypotheses 
according  to  plausibility.  By  examining  the  ordinal  relationships  in  his 
estimates  as  compared  to  the  veridical  orderings  of  hypotheses  we  can  assess 
this  component  of  his  performance.  Table  7 shows  several  interesting  compar- 
isons. 


INSERT  TABLE  7 ABOUT  HERE 


I 


-40- 

The  tests  of  significance  reported  it.  Table  7 are  chi-square  statistics  which 
examine  the  priors-no  priors  difference,  all  of  which  are  non-significant. 

Obviously  the  subjects  are  ordering  the  hypotheses  at  a better  than  chance 
level.  The  chance  expectation  for  frame  one  is  .25  and  for  frame  two  is  .50.  * 

While  their  performance  is  better  than  chance,  it  is  far  from  perfect! 

The  lack  of  significant  differences  between  the  priors  and  no  priors 
conditions  is  shown  in  Table  7.  These  results  combined  with  those  presented 
previously  in  in  Tables  5 and  6 suggest  that  the  primary  effect  of  the 
"priors"  manipulation  is  to  change  the  calibration  of  the  subject.  As  the 
ordinal  differences  and  the  analysis  of  variance  results  are  non-significant 
and  as  the  primary  priors-no  priors  effect  on  the  regression  line  is  a large 
change  in  slope,  b,  it  appears  that  the  primary  effect  of  knowing  the  priors 
is  to  elevate  the  regression  line  in  a veridical  direction,  thus  improving  the 
calibration. 


1 

1 
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5.0  GENERAL  SUMMARY  OF  BOTH  EXPERIMENTS 

It  is  perhaps  us  ,'ful  to  summarize  what  has  been  learned  to  date  because  a much 
clearer  picture  of  hypothesis  generation  is  beginning  to  emerge. 

Previous  research  suggested  that  if  new  evidence  reduces  the  plausibility  of 
the  current  hypothesis  set,  then  new  hypotheses  will  be  generated.  The 
Decision  Maker  searches  for  hypotheses  that  will  be  "leading  contenders"  in 
coraparision  to  those  that  are  already  being  entertained.  The  results  of  this 
study  by  Gettys  and  Fisher,  (in  preparation)  support  the  idea  that  subjective 
plausibility  controls  the  hypothesis  generation  process. 

The  memory  search  and  plausibility  estimation  studies  discussed  here  are  more 
concerned  with  the  subject's  hypothesis  generation  capabilities,  and  have 
increased  our  understanding  of  these  capabilities  considerably. 

The  memory  search  experiment  examined  the  key  question  in  multi-data  hypo- 
thesis retrieval,  i.e.,  how  hypotheses  which  are  consistent  with  various  data 
are  retrieved.  A memory-tagging  model  was  developed,  and  a consistency- 
checking notion  was  presented.  These  ideas  were  combined  into  a model  of  how 
hypothesis  retrieval  from  memory  occurs.  This  model  was  used  to  arrive  at  the 
conclusion  that  hypotheses  are  retrieved  if  tagged  by  two  or  three  data.  A 
second,  equally  important  aspect  of  this  study  was  the  evaluation  of  memory 
retrieval  performance  with  the  minimally-adequate  hypothesis  set.  This 
evaluation  led  to  the  conclusion  that  the  hypothesis  retrieval  process  is  very 
inefficient,  and  that  a Decision  Maker  retrieves  far  fewer  hypotheses  than  he 
should. 

The  plausibility  estimation  study  shows  that  plausibility  estimation  also  is 
deficient,  and  that  part  of  this  deficiency  can  be  traced  to  inadequacies  in 
hypothesis  retrieval.  The  results  suggest  that  hypothesis  retrieval  is 
necessary  in  plausibility  assessment  in  order  to  properly  evaluate  the  catch- 
all hypothesis.  If  the  Decision  Maker  fails  to  retrieve  many  hypotheses  for 
the  catch-all,  he  overestimates  the  plausibilities  of  the  specified  hypo- 
theses. 
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Paradoxically,  if  far  too  few  hypotheses  are  generated,  and  if  the  plausi- 
bility of  these  few  hypotheses  is  overestimated,  the  Decision  Maker  is  left  in 
a very  vulnerable  predicament  where  his  hypothesis  generation  performance  is 
quite  deficient  and  he  is  unaware  of  his  deficiencies. 

One  possible  remedy,  of  course,  would  be  to  improve  hypothesis  retrieval  by 
hypothesis  retrieval  aiding.  This  would  both  increase  the  number  of  hypo- 
theses that  the  Decision  Maker  entertains  and  simultaneously  improve  his 
plausibility  assessments.  We  are  currently  evaluating  this  possibility. 


I 

I 

I 
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Appendix  A 

A Model  Parameter  as  a Unique  Root 
to  a Nth  Degree  Polynomial 
Thomas  Mehle  and  Dale  Umbach 
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Running  head:  Unique  Model  Parameter 
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Binomial  Measure 


Abstract 

Presented  in  this  paper  are  a derivation  for  a variable  response 
criterion  model  and  a demonstration  that  the  estimate  of  the  criterion  is 
is  unique.  This  estimate  may  be  of  value  in  analyzing  psychological 
counting  process  models;  an  example  application  is  discussed.  The 
possible  problem  with  the  estimate  was  that  its  uniqueness  and  hence  its 
utility  was  questionable.  General  proofs  of  the  uniqueness  of  the 
estimate  are  presented  in  traditional  probabilistic  notation  and  alternatel 


in  decision  theoretic  terms. 


Binomial  Measure 


A Model  Parameter  as  a Unique  Root 
to  a Nth  Degree  Polynomial 
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Counting  process  models  have  played  an  important  role  in  psychological 
theories;  the  use  of  such  models  has  been  common  place  whenever  a psychological 
process  may  be  profitably  modeled  by  assuming  that  some  underlying  process 
gives  rise  to  discrete  events  which  are  counted.  A response  is  made  when 
the  count  meets  a criterion.  Included  among  numerous  examples  in  the  liter- 
ature are  models  for  the  quantum  theory  of  vision  (Stevens,  1972;  Pirenne 
am.  Marriott,  1959),  the  availability  model  of  human  decision  behavior 
(Tversky  and  Kahneraan,  1973),  the  accumulator  model  of  reaction  time  behavior 
(Pachella,  1974,  p.  75;  Kantowitz,  1974,  p.  99;  Audley  and  Pike,  1965: 

LaBerge,  1962:  McGill,  1962),  memory  retrieval  models  (see  Ratcliff,  1978, 
p.  74),  signal  detection  models,  (Pike,  1973;  Green  and  Swets,  1966,  p 136) 
and  a frequency  model  for  verbal  discrimination  learning  (Eckert  and  Kanak, 
1974,  p.  583). 

One  approach  to  modeling  counting  processes  postulates  that  there  are 
variations  in  response  criteria  over  trials  for  an  individual  and  across 
individuals  (Gettys,  Mendoza  and  Nicewander,  Note  1.)  That  is,  rather  than 
the  response  criteria  being  fixed  for  all  trials  and  all  individuals,  the 
criteria  is  assumed  to  have  a distribution  function  over  a range  of  possible 
values.  The  following  section  presents  an  example  application  of  a variable 
response  criterion  model  in  which  the  response  criterion  is  assumed  to  be 
binomial ly  distributed.  An  attractive  feature  of  the  binomial  distribution 
is  that  although  the  parameter  is  a root  to  a nth  degree  polynomial,  it  is 
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uniquely  determined  in  this  application.  Proofs  of  the  uniqueness  of  the 
£ estimate  are  presented  in  the  final  section. 

An  Application 


In  a study  of  the  processes  involved  in  retrieving  hypotheses  from 
memory  during  a decision  task,  Gettys,  Fisher  and  Mehle  (Note  2)  obtained 
empirical  probabilities  that  various  hypotheses  were  used  as  responses  on 
trials  for  which  subjects  were  given  varying  amounts  of  data.  The  notation 
used  here  will  be  to  let  (x)  be  the  probability  that  the  j_th  hypothesis 
will  be  recalled  from  memory  for  a_t  least  x data  given  that  the  subject  was 
presented  a set  of  _l  data,  0 < x < i.  Let  R.^  (x)  the  probability  that 
the  j^th  hypothesis  will  be  recalled  from  memory  for  exactly  x data  given 
that  the  subject  was  presented  i data.  Thus 


i i 

M . .(x)  = £ R..(k)  and  (0)  = 1 R.  (k)  E 1*  Figure  1 is  a decision  tree 

” k=x  ^ J k=x 

illustrating  the  calculation  of  R (x)  and  M^j (x)  in  this  application,  when 
J[  = 3.  Let  be  the  response  criterion,  a random  variable,  for  hypothesis 


Insert  Figure  1 about  here 


2_  given  that  i^  data  were  presented.  The  criterion  is  used  in  a typical  counting 
process  model  decision  rule:  if  the  number  of  data  (out  of  the  set  of  i_  data) 
for  which  hypothesis  j_  is  recalled  is  equal  to  or  greater  than  , the 

I 

I 

1 

I 
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subject  will  give  hypothesis  j_  as  a response;  otherwise,  hypothesis  j_  will 
not  be  used  as  a response.  Let  P j (i)  be  the  probabilit  y that  a subject 
will  give  the  j_th  hypothesis  as  a response  given  a set  of  jL  data. 

It  should  be  noted  that  M.^  (x) , R.^  (x) , and  P (i)  are  also  functions 

of  which  data  are  presented.  However,  incorporating  labels  to  indicate  data 
associations  would  needlessly  complicate  the  notational  scheme  since  data 
labels  are  not  crucial  in  this  paper's  formulations. 

Gettys  et  al.  (Note  2)  employed  a variable  response  criterion  model  to 
predict  P ^ C 3 ) and  Pj (6)  given  Pj  (1) . In  the  one-datum  condition,  subjects 
were  instructed  to  respond  with  all  hypotheses  recalled  to  allow  an  assumption 
that  the  response  criteria  on  single-datum  trials  were  always  one:  R^j(l)  = 

PjU),  i - 1,  2 J. 

Next,  for  a given  i^  and  j_,  (x)  was  calculated  for  x = 0,  1,  . . . , i^, 

see  Figure  1.  In  this  application,  these  computations  were  similar  to  using 
the  binomial  probability  mass  function  to  calculate  P(X  *>  x)  for  i^  independent 
Bernoulli  trials,  except  that  the  trials  were  not  Bernoulli  because  the  success 
probabilities,  although  known,  were  not  constant  across  trials  but  varied 
considerably  as  a function  of  which  data  were  presented.  However,  by  assuming 
the  trials  were  independent,  (x)  could  be  calculated  in  the  manner  illustrated 
in  Figure  1.  In  other  applications  for  which  an  assumption  of  constant  success 


probabilities  would  be  reasonable, 


calculation  of 


^ 00 


could  be  accomplished 


with  less  tedium  by  using  the  binomial  distribution. 


t 

I 
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Gettys  et  al.  (Note  2)  made  the  assumption  that  for  trials  on  which  a 
given  hypothesis  was  used  as  a response,  the  criterion  number  of  retrievals, 
Cjj  , was  distributed  binomially,  with  the  following  probability  mass  function: 

f(£)  = (ji'jE0  (1  ~ P)1  " c (1) 

It  should  be  noted  that  the  j (x) ' s are  monotone  decreasing  in  sc, 
x = 0,  1,  2,  . . . , _i.  Assuming  that  is  distributed  according  to  Eq.  (1), 
an  expression  for  P j (i)  is: 

1 A (l) 

f (x.)  = E M . , (x)  \ x]  nX  ~ x 


1 M (x) 
x=0  J 


x=0 


ij 


£X  (1  - £) 


(2) 


By  viewing  Fq.  (2)  as  a polynomial  in  £,  Gettys  et  al.  (Note  2)  used 
the  empirical  values  of  M^j (x)  to  solve  for  £. 


The  Uniqueness  Problem 


A cursory  examination  of  Eq . (2)  would  indicate  that  for  any  given  valuer, 
of  P j (i)  and  MjjOO,  £ need  not  have  an  unique  root.  In  fact,  by  the  Fundamental 
Theorem  of  Algebra,  Eq.  (2)  must  have  i^  roots  in  the  complex  number  space. 

If  £ were  not  unique  on  [0,  1],  in  particular,  its  usefulness  as  a model 
parameter  would  clearly  be  compromised.  Following  are  two  simple  proofs  of 
the  uniqueness  of  £,  the  first  in  decision  theoretic  terms  and  the  second  in 
traditional  probabilistic  notation. 

i fl\ 

Let  f (£)  = T.  Mjj  (x)  ( x J £x  (1  - £)*  ~ x.  The  first  clue  to  the  behavior 
x=0 
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°f  f (£  t ' i be  obtained  by  examining  it  at  the  endpoints  oi  the  interval 
containing  allowable  £ values,  namely  [0,  1]: 


f(0)  *»  M (0)  = 1 > P,(i) 
ij  “ J - 

(3) 

f(l)  - MtjCl)  < P (i) 

(4) 

The  inequality  portion  of  expression  (4)  follows  from  the  monotone  behavior 
of  the  Mij(x),s.  Since  f (0)  >.  Pj  (i.)  and  f(l)  PjU),  and  since  f(£>  is 
continuous,  there  must  be  at  least  one  solution  to  f(£>  = P^(i)  for 
£ £ [0,  1],  Thus  all  that  remains  to  be  shown  is  that  there  is  at  most  one 
root . 


The  first  proof,  stated  in  decision  theoretic  notation,  uses  the  fact 
that  the  binomial  (n,  £>  family  with  fixed  n has  the  monotone  likelihood 
ratio  (MLR)  property  but  can  be  generalized  to  any  family  of  distributions 
with  the  MLR  property,  e.g.  any  exponential  family  with  natural  parameterization. 
Def ine 


<t>  j (x)  = J 1 for  x < JL 

I 0 for  x > J_ 

for  J_  *»  1 , 2 , . . . , _i  + 1;  and  for  | = 1,  2,  . . . , jL,  define 
a k “ Mi j “ D ~ Mi j Cjk)  ; a1+1  *=  Mjj  (jt)  . 

i+1 

Note  that  one  can  write  Mjj(x)  » E a,  <f>k(x)  for  x = 0,  1,  . . . , jL. 

k-1 


(5) 

(6) 

Now, 


f (£)  may  be  expressed  as: 
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A i+1  /l\  , 4 

f(£)  = z z ak  $k(i)  y (i-£>  3 

1=0  k=l 


(7) 


1 “k  E <i-£)1-j 

k=l  1-0  v 


ll0:  £■  — versus  H^:  p < 1/2 

But,  by  Lehman  (1959,  p.  68)  each  <f>k  is  a uniformly  most  powerful  test  of  the 
binomial  parameter  £ with  fixed  n for  Hq:  £ _>  1/2  versus  £ < 1/2,  and 

i fl\ 

Z ‘t'kC-L)  l XjET  (1  - £)*  - ^ is  strictly  decreasing  in  £ for  £t  [0,  1] 

1=0  V ' 

for  each  1 = 1,  2,  . . . , i.  Since  the  Mjj(x)'s  are  monotone  in  x*  each 
ok  > 0 for  k = 1,  2,  . . , , 1 + 1.  So  f is  strictly  decreasing  if  ak  > 0 
for  some  k = 1,  2,  ....  i.  But  this  is  equivalent  to: 
i 

0<  Z a = M4i(0)  - M fi)  - 1 - M,,(i).  (8) 

1-1  J 3 1J  3 " 


All  that  is  required  now  is  for  all  Mjj(x)'s  to  not  be  equal.  This 
requirement  may  be  satisfied  by  observing  that  were  all  M^j(x)'s  equal,  to 
some  common  value  M,  P^(i)  must  also  equal  M and  any  value  of  £ would  work 
equally  well.  Thus,  except  for  the  preceding  degenerate  case,  there  is  at 
most  one  root  of  f (£)  = P j (1)  in  [0,  1]. 

Another  proof  that  f is  strictly  decreasing  follows  from  a result  in 
Feller  (3968,  p.  173)  that  the  binomial  distribution  function,  F(k;  £,  £)  , 
can  be  represented: 
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f(k;  n,£) 


(n  - k) 


(l-t)k  dt 


0 


Suppose  0 _<  < £2  < 1. 


Define 


and  F 

2 


as 


Ff(x)  = F(x;  i ,£^) 
F2(x)  = F(x ; i .£2)  • 


Then 


F(£l) 


and 


f(£2)  " 


(>) 


[0,  I] 


XM  U 


(x)  dF2(x). 


(0,  I] 

Thus  f(pj)  > f(£2)  if  and  only  if 


J Mlj(x)  d[Fj 


- F2]  (x)  > 0. 


[0,  I) 

However,  integration  by  parts  yields: 


8 


(9) 


(10) 

(ID 


(12) 


03) 


(14) 


\ 

I 

I 
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/ 


Mlj  (x)  d[F1  - F2]  (x)  - Mlj(x)[F1(*>-Fi(x)]^ 


[0,  1] 


<r>) 


[0,  i] 


F 2(x)  ] dMtj  (x) 


s 

[0,  i] 


[Fi(x) 


F 2(x)]  dM1j  (x) 


i-1 

- E [Fi(k)  - F»(k>]  lM.,(k)  - Mil  (k+1)  ] . 
k -O  1J  J 


Now,  the  monotone  property  implies  that  M^Qc)  - M^jCk)  _>  0 and  Eq . 
(8)  implies  for  some  k *>  0,  1,  ....  i-1  that  M^fk)  - M^j  (k+1)  > 0.  Thus 
the  proof  is  completed  once  it  is  established  that  for  k “0,  1,  . . i.-] 

Fj(k)  > F2(k).  (16) 


By  Eq . (15),  however,  £2  > £.1  implies 

1-P, 

Fi(k)  - (i  - k)  (k)  f t1^"1  (l-t)k  dt  > 


(17) 


(i  - k) 


1-P2 


. i-k-1 


(l-t)k  dt 


F 2 (k) , 


thus  completing  the  proof. 
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Figure  1.  The  P j (R)  symbol  refers  to  the  probability  of  retrieving 
a hypothesis  for  datum  J_;  Pj  (R)  - 1 - P j (R)  . The  R values  are  the  number 
of  recalls  of  hypothesis  J_  for  each  numbered  path  through  the  tree.  The 
R3j (x)  values  are  computed  as  follows:  Rj^ (0)  = the  probability  of  path  8; 
R3j(1)  - the  probability  of  path  4,  6 or  7;  R3j(2)  “ the  probability  of  path 
2,  3 or  5;  R3j(3)  - the  probability  of  path  1. 


Appendix  B 


Predicted  versus  Empirical 

> Hypothesis  Recall  Probabilities 


Data  and  Hypotheses  Predicted  Empirical 


Data:  Beef,  Fish  and  Aerospace 

Industry 

Texas 

.849 

.898 

Lous  .siana 

.154 

.082 

California 

.512 

.694 

Florida 

.550 

.796 

Georgia 

.042 

.082 

Oklahoma 

.433 

.204 

Colorado 

.176 

.102 

New  York 

.106 

.122 

Oregon 

.160 

.184 

Alabama 

.029 

.082 

Hawaii 

.079 

.082 

Missouri 

.047 

.020 

Tennessee 

.018 

.020 

Illinois 

.054 

.020 

Wyoming 

.126 

.020 

Data:  Citrus  Fruit,  Tourists, 

Cypress  Trees 

Florida 

.944 

.959 

Georgia 

.249 

.388 

California 

.857 

.837 

L 


Data  and  Hypotheses 


Predicted 


Empirical 


Louisiana 

.281 

.388 

Texas 

.512 

.510 

Alabama 

.135 

.245 

Hawaii 

• 311 

.163 

Arizona 

.203 

.122 

New  Mexico 

.128 

.082 

Washington 

.114 

.082 

Mississippi 

.164 

.163 

Oregon 

.045 

.061 

Nevada 

.160 

.061 

Arkansas 

.026 

.041 

Data:  Beef,  Fish,  Aerospace 
Industry,  Citrus  Fruit,  Tourists, 
Cypress  Trees 


Texas 

.779 

.750 

Florida 

.877 

.896 

California 

.780 

.708 

Louisiana 

.178 

.208 

Hawaii 

.143 

.125 

Oregon 

.060 

.125 

Georgia 

.092 

on 

CO 

o 

Oklahoma 

.199 

.083 

Alabama 

.044 

on 

oo 

o 

Data:  Psychology  I,  U.S.  History, 
Industrial  Psychology 


Architecture 

.020 

.021 

English 

.095 

.043 

Pre  Med 

.083 

.191 

Zoology 

.054 

.043 

Biology 

.024 

.021 

Data  and  Hypotheses 


Predicted 


Empirical 


Pre  Law 

.104 

.170 

Sociology 

.211 

.191 

Education 

.202 

.234 

History 

.253 

.426 

Mathematics 

.066 

.021 

Chemistry 

.064 

.064 

Business 

.206 

.298 

Finance 

.042 

.085 

Management 

.117 

.106 

Economics 

.059 

.043 

Accounting 

.089 

.043 

Public  Relations 

.027 

.043 

Political  Science 

.152 

.149 

Engineering 

.154 

.149 

Mechanical  Engineering 

.020 

.021 

Data:  Design/Measurement  of  Work, 
Personnel  Management , The  Behavior 
of  Organizations 


Psychology 

.378 

• 319 

Sociology 

.229 

.255 

Education 

.080 

.106 

Political  Science 

.140 

.064 

Anthropology 

.079 

.043 

Advertising 

.039 

.021 

Economics 

.094 

.064 

Business 

.335 

.532 

Management 

.352 

.638 

Accounting 

.203 

.298 

Finance 

.114 

.170 

Business  Administration 

.137 

.149 

Marketing 

.121 

.149 

! 
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Data  and  Hypotheses  Predicted  Empirical 


English 

.033 

.021 

Engineering 

.184 

.106 

Industrial  Engineering 

.039 

.021 

Data:  Psychology  I,  U.S.  History, 

Industrial  Psychology,  Design/Measurement 

of  Work,  Personnel  Management,  The 

of  Organizations 

Behavior 

Psychology 

.665 

.755 

Political  Science 

.164 

.184 

Sociology 

.264 

.204 

Education 

.159 

.143 

English 

.064 

.020 

Business 

.327 

.388 

Management 

.261 

.490 

Accounting 

.145 

.122 

Economics 

.071 

.061 

Business  Administration 

.061 

.122 

Finance 

.066 

.082 

Engineering 

.189 

.061 

Data:  Hammer,  Drill,  Saw 

Plumber 

.234 

.152 

Carpenter 

.909 

.978 

Mechanic 

.232 

.239 

Electrician 

.114 

.130 

Machinist 

.083 

.108 

Metal/Steel  Worker 

.102 

.065 

Welder 

.102 

.065 

Construction 

.206 

.478 

Artist 

.027 

.043 

Stagehand 

.021 

.022 

Data  and  Hypotheses 


Predicted 


Empirical 


Roofer 

.054 

.065 

Dentist 

.096 

.022 

Physician 

.072 

.022 

Furniture  Manufacturer 

.096 

.174 

Framer 

. .049 

.022 

Mason 

.032 

.022 

Farmer 

.055 

.043 

Data:  Wrench,  Pipe  Threader, 
Blow  Torch 


Plum*  er 

.608 

.804 

Carpenter 

.073 

.109 

Mechanic 

.508 

.304 

Electrician 

.140 

.065 

Metal/Steel  Worker 

.107 

.087 

Welder 

.286 

.587 

Construction 

.176 

.197 

Oil  Field  Worker 

.128 

.130 

Air  Conditioning  Worker 

.063 

.043 

Pipeline  Worker 

.162 

.196 

Engineer 

.021 

.022 

Ship  Builder 

.030 

.022 

Farmer 

.043 

.022 

Machinist 

.089 

.087 

Data:  Hammer,  Drill,  Saw,  Wrench, 
Pipe  Threader,  Blow  Torch 


Plumber 

.695 

.934 

Carpenter 

.812 

.739 

Mechanic 

.643 

GO 

• 

Electrician 

.232 

.217 

Machinist 

.156 

.130 

I 


Data  and  Hypotheses 

Predicted 

Empirical 

Metal/Steel  Worker 

.192 

.152 

Welder 

.368 

.587 

’ Construction 

.344 

.261 

i 

I 

I 


Appendix  C:  Problems  used  in  plausibility  experiment 
(Probabilities  are  expressed  as  percentages) 

1 datum  Low  catch-all  plausibility 


Problem  1. 

(Sample  size  = 104) 

Dept. -name 

Course-no. 

Course-title 

Data: 

Mgt. 

4363 

Organizational  Behavior 

Major 

P(H) 

P(H/D) 

Hypotheses: 

1 ) Management 

5.7$ 

64.4$ 

2)Political  Sci. 

1.9$ 

1.9$ 

3)Law  Enf.  Admin. 

1.1$ 

2.9$ 

4)A11  others 

91.3$ 

30.8$ 

Problem  2. 

(Sample  size  = 62) 

Dept. -name 

Course-no. 

Course-title 

Data: 

PSC 

4803 

Criminal  Legal  Pros. 

Major 

P(H) 

P(H/D) 

Hypotheses: 

DPolitical  Sci. 

1.9$ 

22.6$ 

2)History 

.9$ 

4.8$ 

3)Law  Enf.  Admin. 

1.1$ 

50.0$ 

4) All  others 

96.1$ 

22.6$ 

1 datum  Medium  Catch-all  plausibility 


Problem  3-  (Sample  size  = 152) 


Data: 

Dept. -name 

Zoo. 

Course-no. 

3333 

Course-title 

Genetics 

Hypotheses: 

Major 

1 )Chemistry 

P(H) 

1.1$ 

P(H/D) 

9.2$ 

2) Psychology 

2.4$ 

9.2$ 

3)Zoology 

2.4$ 

34.9$ 

4) All  others 

94.1$ 

46.7$ 

I 

I 


Problem  4. 


(Sample  size  = 127) 


Dept. -name 

Course-no. 

Course-title 

Data: 

PSC 

3553 

Prin.  Criminal  Inv. 

Ma.ior 

P(H) 

P(H/D) 

Hypotheses: 

1 ) Management 

5.7* 

7.1* 

2)Law  Enf.  Admin. 

1.1* 

31.5* 

3)Political  Sci. 

1.9* 

18.1* 

4)A11  others 

91.3* 

43.3* 

1 datum  High 

catch-all  plausibility 

Problem  5. 

(Sample  size  = 182) 

Dept. -name 

Course-no. 

Course-title 

Data: 

Math 

3703 

Elementary  Stat. 

Major 

P(H) 

P(H/D) 

Hypotheses: 

1) Zoology 

17%% 

2.7* 

2) Psychology 

2.4* 

1.1* 

3) Accounting 

5.7% 

1.1* 

4) All  others 

89.5* 

95.1* 

Problem  6. 

(Sample  size  = 276) 

Dept. -name 

Course-no. 

Course-title 

Data: 

Zoo. 

2255 

Human  Anatomy 

Majors 

P(H) 

P(H/D) 

Hypotheses 

1) Physical  Therapy 

.3* 

4.0* 

2)Phys.  Ed. 

1.3* 

14.9* 

3) Zoology 

2.4* 

.7* 

4) All  others 

96.0* 

80.4* 

3 data-Low  catch-all  Plausibility 


i 


I 

I 

I 


_ 


Problem  7. 

(Sample  size  = 97) 

Data: 

Dept. -name 

1 )Educ. 

Course-no. 

2424 

2)Math 

2214 

3)Psy. 

1113 

Hypotheses: 

Major 

1 ) Sociology 

P(H) 

1.9$ 

2)A11  Education 

8.1$ 

3)English 

.7$ 

4)  All  others 

89.3$ 

Problem  8. 

(Sample  size  = 186) 

Data: 

Dept. -name 

1)Phil. 

Course-no. 

1203 

2)Psy. 

1113 

3)Educ. 

2424 

Hypotheses: 

Major 

1 ) All  Education 

P(H) 

8.1$ 

2)A11  Business 

21.0$ 

3)Recreation 

.5$ 

4) All  others 

70.4$ 

3 data-Medium  catch-all  plausibility 

Problem  9. 

Sample  size  = 168) 

Data: 

Dept. -name 

1 )Bot. 

Course-no. 

1114 

2)Chem. 

1314 

3) Chem. 

3053 

Hypotheses: 

Major 

1 ) Microbiology 

P(H) 

1.1$ 

2) All  Engineering 

12.1$ 

3) Zoology 

2.4$ 

4) All  others 

84.4$ 

Course-title 
School  in  Am.  Culture 

Anth.  for  Elem.  Teachers 

Elements  of  Psych. 

P(H/D) 

1.0$ 

90.7$ 

1.0$ 

7.2$ 

\ 

I 

1 

Course-title  j 

Phil.  Soc.  and  Relig.  Morality  i 

Elements  of  Psy. 

School  in  Am.  Culture 

P(H/D) 

57.0$ 

10.0$ 

3.2$ 

29.6$ 


Course-title 
Gen.  Botony 

Gen.  Chem. 

Organic  Chem. 

P(H/D) 

13.7$ 

11.3$ 

16.1$ 

58.9$ 


•i 


Problem  10.  (Sample  size  = 133) 


Dept. -name 

Course-no. 

Course-title 

Data: 

1 )Zoo. 

2094 

Invert.  Zoo. 

2)Zoo. 

1121 

Intro.  Zoo.  Lab. 

3)Phys. 

2414 

Gen.  Phys:  Mech, Sound, Heat 

Major 

P(H) 

P(H/D) 

Hypotheses: 

1)Lab.  Technology 

.6$ 

3.8$ 

2)Microbiology 

1.1$ 

7.5$ 

3) Zoology 

2.4$ 

44.4$ 

4) All  others 

95.951 

44.4$ 

3 data  High 

catch-all  plausibility 

Problem  11. 

(Sample  size  = 263) 

Dept. -name 

Course-no. 

Course-title 

Data: 

1)Math 

1823 

Calculus  I 

2)Chem. 

1314 

Gen.Chem. 

3)Phil. 

1203 

Phil.  Soc.  and  Relig. 

Morality 

Major 

P(H) 

P(H/D) 

Hypotheses 

1 ) Zoology 

2.4$ 

5.3% 

2) Geo logy 

1.2$ 

6.8$ 

3) Petrol.  Engin. 

2.1$ 

7.6$ 

4) All  others 

94.3$ 

80.2$ 

Problem  12. 

(Sample  size  = 452) 

Dept. -name 

Course-no. 

Course-Title 

Data: 

1 )Zoo. 

1121 

Intro.  Zool.  Lab. 

2)Chem. 

1314 

Gen.  Chem. 

3)Phys. 

2414 

Gen.  Phys:  Mech, Sound, Heat 

Major 

P(H) 

P(H/D) 

Hypotheses: 

1 ) Microbiology 

1.1$ 

9.1$ 

2) Pharmacy 

.8$ 

7.7$ 

3) Zoology 

2.4$ 

16.6$ 

4) All  others 

95.7$ 

66.6$ 

6 data  Low  catch-all  plausibility 


Problem  13. 

(Sample  size  = 293) 

Data: 

Dept. -name 

1)Math. 

Course-no, 

2423 

2)Math 

1823 

3)Chem. 

1314 

4 ) Chem. 

3053 

5)Engr. 

2514 

Hypotheses: 

Major 

1 ) Chemistry 

P(H) 

1.1% 

2)Accounting 

6.9% 

3) All  engineering 

12.1% 

4) All  others 

79.9% 

Problem  14. 

(Sample  size  = 39) 

Data: 

Dept. -name 

1)Math 

Course-no 

1444 

2)Math 

1513 

3)Acct. 

2133 

4)Econ. 

3113 

5)Hist. 

1483 

6)Psy. 

1113 

Hypotheses: 

Major 

1 ) Psychology 

P(H) 

2.4% 

2) All  Education 

8.1% 

3)A11  Business 

21.0% 

4) All  others 

68.5% 

6 data  Medium  catch-all  plausibility 

Problem  15. 

(Sample  size  = 177) 

Data: 

Dept. -name 

1 )Econ. 

Course-no 

2113 

2)Hist. 

1483 

I 

I 


Course-title 
Calculus  II 

Calculus  I 

Gen.  Chem. 

Organic  Chem. 

Gen.  Phys  for  Eng. and  Sci. 

P(H/D) 

2.7% 

2.0% 

78.5% 

16.7% 


Course-title 
Elem  Func.  & Coor.  Geora. 

College  Algebra 

Elem-Accounting  I 

Intermed.  Price  Theory 

U.S.  1492  to  1865 

Elem.  of  Psych. 

P(H/D) 

5.1  i 

7.7% 

59.0% 

28.2% 


Course-title 
Prin.  of  Economics 

U.S.  1492-1865 


3)Hist. 

1493 

4 )PSC 

1113 

5)Psy. 

1113 

6 )Soc. 

1113 

Major  P(H) 


Hypotheses: 

1 )History 

• 9H 

2)Zoology 

2.4H 

3) All  Business 

21  .OH 

4) All  others 

75. 7H 

Problem  16. 

(Sample  size  = 158) 

Dept. -name 

Course-no. 

Data: 

1 )Chem. 

1614 

2 ) Econ . 

2113 

3)Econ. 

2843 

4)Psy. 

1113 

5)Fin. 

3303 

6) Math 

1444 

Major  P(H) 


Hypotheses: 

1 ) Accounting 

6.9% 

2) Economic s 

• 2H 

3) All  Education 

■8.1H 

4) All  others 

84. 8H 

6 data  High 

catch-all  plausibility 

Problem  17. 

(Sample  size  = 470) 

Data: 

Dept. -name 

DChem 

Course-no. 

1314 

2)Math 

1823 

3)Math 

2423 

4) Phys. 

2514 

5)PSC 

1113 

6)Engr. 

1112 

I 

1 


U.S.  1065-present 
Gov't  of  U.S. 
Elements  of  Psych. 
Intro,  to  Sociology 

P(H/D) 

4. OH 

2.3% 

35. OH 
58. 7H 


Course-title 
Chem.  for  Non-Science 

Prin.  of  Economics 

Elements  of  Stat. 

Elements  of  Psych. 

Business  Finance 

Elem.  Func.  and  Coor.  Geo. 

P(H/D) 

32. 9H 

• 6H 
2.5H 
64. OH 


Course-title 
Gen.  Chem. 

Calculus  I 

Calculus  II 

Gen.  Phys.  for  Eng.&  Sci. 
Gov’t,  of  U.S. 

Intro,  to  Engineering 


Hypotheses: 


Problem  18. 

Data: 


Hypotheses: 


Major 

P(H) 

DElectrical  Engr. 

2.1 

2)Meteorology 

.7$ 

3) Accounting 

6.9$ 

4) All  others 

90.3$ 

(Sample  size  = 58) 

Dept. -name 

Course-no. 

1)Zoo. 

1114 

2)  Zoo 

1121 

3) Phys. 

2414 

4)Econ. 

3113 

5)Hist. 

1483 

6)Psy. 

1113 

Major 

P(H) 

1)Phys.  Ed. 

1.3$ 

2)Chemistry 

1.1$ 

3) Zoology 

2.4$ 

4) All  others 

95.2$ 

P(H/D) 

16.0 

2.6$ 

2.3$ 

79.1$ 


Course-title 
Intro,  to  Zool. 

Intro.  Zool.  Lab. 

Gen.  Phys:  Mech, Sound, Heat 

Intermed.  Price  Theory 

U.S.  1492-1865 

Elements  of  Psych. 

P(H/D) 

12.1 

5.1$ 

15.5$ 

67.2$ 


| 
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